Abstract
Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains. This aspect penalizes the performance of distributional-based approaches, which is closely related to the reliability of word’s cooccurrence counts extracted from comparable corpora. A solution to avoid this limitation is to associate external resources with the comparable corpus. Since bilingual word embeddings have recently shown efficient models for learning bilingual distributed representation of words, we explore different word embedding models and show how a general-domain comparable corpus can enrich a specialized comparable corpus via neural networks- Anthology ID:
- I17-1069
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Greg Kondrak, Taro Watanabe
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 685–693
- Language:
- URL:
- https://aclanthology.org/I17-1069
- DOI:
- Cite (ACL):
- Amir Hazem and Emmanuel Morin. 2017. Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 685–693, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Bilingual Word Embeddings for Bilingual Terminology Extraction from Specialized Comparable Corpora (Hazem & Morin, IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/I17-1069.pdf