Abstract
Enabling cross-lingual NLP tasks by leveraging multilingual word embedding has recently attracted much attention. An important motivation is to support lower resourced languages, however, most efforts focus on demonstrating the effectiveness of the techniques using embeddings derived from similar languages to English with large parallel content. In this study, we first describe the general requirements for the success of these techniques and then present a noise tolerant piecewise linear technique to learn a non-linear mapping between two monolingual word embedding vector spaces. We evaluate our approach on inferring bilingual dictionaries. We show that our technique outperforms the state-of-the-art in lower resourced settings with an average of 3.7% improvement of precision @10 across 14 mostly low resourced languages.- Anthology ID:
- D19-1076
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 823–832
- Language:
- URL:
- https://aclanthology.org/D19-1076
- DOI:
- 10.18653/v1/D19-1076
- Cite (ACL):
- Masud Moshtaghi. 2019. Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 823–832, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Supervised and Nonlinear Alignment of Two Embedding Spaces for Dictionary Induction in Low Resourced Languages (Moshtaghi, EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/D19-1076.pdf