2022
pdf
abs
D-Terminer: Online Demo for Monolingual and Bilingual Automatic Term Extraction
Ayla Rigouts Terryn
|
Veronique Hoste
|
Els Lefever
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places
This contribution presents D-Terminer: an open access, online demo for monolingual and multilingual automatic term extraction from parallel corpora. The monolingual term extraction is based on a recurrent neural network, with a supervised methodology that relies on pretrained embeddings. Candidate terms can be tagged in their original context and there is no need for a large corpus, as the methodology will work even for single sentences. With the bilingual term extraction from parallel corpora, potentially equivalent candidate term pairs are extracted from translation memories and manual annotation of the results shows that good equivalents are found for most candidate terms. Accompanying the release of the demo is an updated version of the ACTER Annotated Corpora for Term Extraction Research (version 1.5).
2020
pdf
bib
Proceedings of the 6th International Workshop on Computational Terminology
Béatrice Daille
|
Kyo Kageura
|
Ayla Rigouts Terryn
Proceedings of the 6th International Workshop on Computational Terminology
pdf
abs
TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset
Ayla Rigouts Terryn
|
Veronique Hoste
|
Patrick Drouin
|
Els Lefever
Proceedings of the 6th International Workshop on Computational Terminology
The TermEval 2020 shared task provided a platform for researchers to work on automatic term extraction (ATE) with the same dataset: the Annotated Corpora for Term Extraction Research (ACTER). The dataset covers three languages (English, French, and Dutch) and four domains, of which the domain of heart failure was kept as a held-out test set on which final f1-scores were calculated. The aim was to provide a large, transparent, qualitatively annotated, and diverse dataset to the ATE research community, with the goal of promoting comparative research and thus identifying strengths and weaknesses of various state-of-the-art methodologies. The results show a lot of variation between different systems and illustrate how some methodologies reach higher precision or recall, how different systems extract different types of terms, how some are exceptionally good at finding rare terms, or are less impacted by term length. The current contribution offers an overview of the shared task with a comparative evaluation, which complements the individual papers by all participants.
2019
pdf
abs
Analysing the Impact of Supervised Machine Learning on Automatic Term Extraction: HAMLET vs TermoStat
Ayla Rigouts Terryn
|
Patrick Drouin
|
Veronique Hoste
|
Els Lefever
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Traditional approaches to automatic term extraction do not rely on machine learning (ML) and select the top n ranked candidate terms or candidate terms above a certain predefined cut-off point, based on a limited number of linguistic and statistical clues. However, supervised ML approaches are gaining interest. Relatively little is known about the impact of these supervised methodologies; evaluations are often limited to precision, and sometimes recall and f1-scores, without information about the nature of the extracted candidate terms. Therefore, the current paper presents a detailed and elaborate analysis and comparison of a traditional, state-of-the-art system (TermoStat) and a new, supervised ML approach (HAMLET), using the results obtained for the same, manually annotated, Dutch corpus about dressage.
2018
pdf
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents
Ayla Rigouts Terryn
|
Véronique Hoste
|
Els Lefever
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)