Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis

Amir Hazem, Béatrice Daille


Abstract
Bilingual lexicon extraction from comparable corpora is usually based on distributional methods when dealing with single word terms (SWT). These methods often treat SWT as single tokens without considering their compositional property. However, many SWT are compositional (composed of roots and affixes) and this information, if taken into account can be very useful to match translational pairs, especially for infrequent terms where distributional methods often fail. For instance, the English compound xenograft which is composed of the root xeno and the lexeme graft can be translated into French compositionally by aligning each of its elements (xeno with xéno and graft with greffe) resulting in the translation: xénogreffe. In this paper, we experiment several distributional modellings at the morpheme level that we apply to perform compositional translation to a subset of French and English compounds. We show promising results using distributional analysis at the root and affix levels. We also show that the adapted approach significantly improve bilingual lexicon extraction from comparable corpora compared to the approach at the word level.
Anthology ID:
L16-1496
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3110–3115
Language:
URL:
https://aclanthology.org/L16-1496
DOI:
Bibkey:
Cite (ACL):
Amir Hazem and Béatrice Daille. 2016. Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3110–3115, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Bilingual Lexicon Extraction at the Morpheme Level Using Distributional Analysis (Hazem & Daille, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/L16-1496.pdf