Abstract
Bilingual word embeddings, which represent lexicons of different languages in a shared embedding space, are essential for supporting semantic and knowledge transfers in a variety of cross-lingual NLP tasks. Existing approaches to training bilingual word embeddings require either large collections of pre-defined seed lexicons that are expensive to obtain, or parallel sentences that comprise coarse and noisy alignment. In contrast, we propose BiLex that leverages publicly available lexical definitions for bilingual word embedding learning. Without the need of predefined seed lexicons, BiLex comprises a novel word pairing strategy to automatically identify and propagate the precise fine-grain word alignment from lexical definitions. We evaluate BiLex in word-level and sentence-level translation tasks, which seek to find the cross-lingual counterparts of words and sentences respectively. BiLex significantly outperforms previous embedding methods on both tasks.- Anthology ID:
- W19-4316
- Volume:
- Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Isabelle Augenstein, Spandana Gella, Sebastian Ruder, Katharina Kann, Burcu Can, Johannes Welbl, Alexis Conneau, Xiang Ren, Marek Rei
- Venue:
- RepL4NLP
- SIG:
- SIGREP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 142–147
- Language:
- URL:
- https://aclanthology.org/W19-4316
- DOI:
- 10.18653/v1/W19-4316
- Cite (ACL):
- Weijia Shi, Muhao Chen, Yingtao Tian, and Kai-Wei Chang. 2019. Learning Bilingual Word Embeddings Using Lexical Definitions. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 142–147, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Learning Bilingual Word Embeddings Using Lexical Definitions (Shi et al., RepL4NLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W19-4316.pdf