Abstract
Cross-lingual word embeddings provide a way for information to be transferred between languages. In this paper we evaluate an extension of a joint training approach to learning cross-lingual embeddings that incorporates sub-word information during training. This method could be particularly well-suited to lower-resource and morphologically-rich languages because it can be trained on modest size monolingual corpora, and is able to represent out-of-vocabulary words (OOVs). We consider bilingual lexicon induction, including an evaluation focused on OOVs. We find that this method achieves improvements over previous approaches, particularly for OOVs.- Anthology ID:
- 2021.starsem-1.29
- Volume:
- Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venue:
- *SEM
- SIG:
- SIGSEM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 302–307
- Language:
- URL:
- https://aclanthology.org/2021.starsem-1.29
- DOI:
- 10.18653/v1/2021.starsem-1.29
- Cite (ACL):
- Ali Hakimi Parizi and Paul Cook. 2021. Evaluating a Joint Training Approach for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora on Lower-resource Languages. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 302–307, Online. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating a Joint Training Approach for Learning Cross-lingual Embeddings with Sub-word Information without Parallel Corpora on Lower-resource Languages (Hakimi Parizi & Cook, *SEM 2021)
- PDF:
- https://preview.aclanthology.org/author-url/2021.starsem-1.29.pdf