Supervised Feature-based Classification Approach to Bilingual Lexicon Induction from Specialised Comparable Corpora

Ayla Rigouts Terryn


Abstract
This study, submitted to the BUCC2023 shared task on bilingual term alignment in comparable specialised corpora, introduces a supervised, feature-based classification approach. The approach employs both static cross-lingual embeddings and contextual multilingual embeddings, combined with surface-level indicators such as Levenshtein distance and term length, as well as linguistic information. Results exhibit improved performance over previous methodologies, illustrating the merit of integrating diverse features. However, the error analysis also reveals remaining challenges.
Anthology ID:
2023.contents-1.8
Volume:
Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC)
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Amal Haddad Haddad, Ayla Rigouts Terryn, Ruslan Mitkov, Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venues:
ConTeNTS | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
59–68
Language:
URL:
https://aclanthology.org/2023.contents-1.8
DOI:
Bibkey:
Cite (ACL):
Ayla Rigouts Terryn. 2023. Supervised Feature-based Classification Approach to Bilingual Lexicon Induction from Specialised Comparable Corpora. In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC), pages 59–68, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Supervised Feature-based Classification Approach to Bilingual Lexicon Induction from Specialised Comparable Corpora (Rigouts Terryn, ConTeNTS-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.contents-1.8.pdf