Abstract
State-of-the-art methods for unsupervised bilingual word embeddings (BWE) train a mapping function that maps pre-trained monolingual word embeddings into a bilingual space. Despite its remarkable results, unsupervised mapping is also well-known to be limited by the original dissimilarity between the word embedding spaces to be mapped. In this work, we propose a new approach that trains unsupervised BWE jointly on synthetic parallel data generated through unsupervised machine translation. We demonstrate that existing algorithms that jointly train BWE are very robust to noisy training data and show that unsupervised BWE jointly trained significantly outperform unsupervised mapped BWE in several cross-lingual NLP tasks.- Anthology ID:
- P19-1312
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3224–3230
- Language:
- URL:
- https://aclanthology.org/P19-1312
- DOI:
- 10.18653/v1/P19-1312
- Cite (ACL):
- Benjamin Marie and Atsushi Fujita. 2019. Unsupervised Joint Training of Bilingual Word Embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3224–3230, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Joint Training of Bilingual Word Embeddings (Marie & Fujita, ACL 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/P19-1312.pdf