Abstract
Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of cross-lingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2% in the nearest neighbour ranking.- Anthology ID:
- R19-1140
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
- Month:
- September
- Year:
- 2019
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 1222–1228
- Language:
- URL:
- https://aclanthology.org/R19-1140
- DOI:
- 10.26615/978-954-452-056-4_140
- Cite (ACL):
- Ahmet Üstün, Gosse Bouma, and Gertjan van Noord. 2019. Cross-Lingual Word Embeddings for Morphologically Rich Languages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 1222–1228, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Cross-Lingual Word Embeddings for Morphologically Rich Languages (Üstün et al., RANLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/R19-1140.pdf