Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models

Amir Hazem, Emmanuel Morin


Abstract
There is a rich flora of word space models that have proven their efficiency in many different applications including information retrieval (Dumais, 1988), word sense disambiguation (Schutze, 1992), various semantic knowledge tests (Lund et al., 1995; Karlgren, 2001), and text categorization (Sahlgren, 2005). Based on the assumption that each model captures some aspects of word meanings and provides its own empirical evidence, we present in this paper a systematic exploration of the principal corpus-based word space models for bilingual terminology extraction from comparable corpora. We find that, once we have identified the best procedures, a very simple combination approach leads to significant improvements compared to individual models.
Anthology ID:
L16-1661
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4184–4187
Language:
URL:
https://aclanthology.org/L16-1661
DOI:
Bibkey:
Cite (ACL):
Amir Hazem and Emmanuel Morin. 2016. Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4184–4187, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models (Hazem & Morin, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/L16-1661.pdf