TermEval 2020: TALN-LS2N System for Automatic Term Extraction

Amir Hazem; Mérième Bouhandi; Florian Boudin; Béatrice Daille

TermEval 2020: TALN-LS2N System for Automatic Term Extraction

Amir Hazem, Mérieme Bouhandi, Florian Boudin, Beatrice Daille

Abstract

Automatic terminology extraction is a notoriously difficult task aiming to ease effort demanded to manually identify terms in domain-specific corpora by automatically providing a ranked list of candidate terms. The main ways that addressed this task can be ranged in four main categories: (i) rule-based approaches, (ii) feature-based approaches, (iii) context-based approaches, and (iv) hybrid approaches. For this first TermEval shared task, we explore a feature-based approach, and a deep neural network multitask approach -BERT- that we fine-tune for term extraction. We show that BERT models (RoBERTa for English and CamemBERT for French) outperform other systems for French and English languages.

Anthology ID:: 2020.computerm-1.13
Volume:: Proceedings of the 6th International Workshop on Computational Terminology
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Béatrice Daille, Kyo Kageura, Ayla Rigouts Terryn
Venue:: CompuTerm
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 95–100
Language:: English
URL:: https://aclanthology.org/2020.computerm-1.13
DOI:
Bibkey:
Cite (ACL):: Amir Hazem, Mérieme Bouhandi, Florian Boudin, and Beatrice Daille. 2020. TermEval 2020: TALN-LS2N System for Automatic Term Extraction. In Proceedings of the 6th International Workshop on Computational Terminology, pages 95–100, Marseille, France. European Language Resources Association.
Cite (Informal):: TermEval 2020: TALN-LS2N System for Automatic Term Extraction (Hazem et al., CompuTerm 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2020.computerm-1.13.pdf

PDF Search