Abstract
Cross-lingual word embeddings create a shared space for embeddings in two languages, and enable knowledge to be transferred between languages for tasks such as bilingual lexicon induction. One problem, however, is out-of-vocabulary (OOV) words, for which no embeddings are available. This is particularly problematic for low-resource and morphologically-rich languages, which often have relatively high OOV rates. Approaches to learning sub-word embeddings have been proposed to address the problem of OOV words, but most prior work has not considered sub-word embeddings in cross-lingual models. In this paper, we consider whether sub-word embeddings can be leveraged to form cross-lingual embeddings for OOV words. Specifically, we consider a novel bilingual lexicon induction task focused on OOV words, for language pairs covering several language families. Our results indicate that cross-lingual representations for OOV words can indeed be formed from sub-word embeddings, including in the case of a truly low-resource morphologically-rich language.- Anthology ID:
- 2020.lrec-1.330
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 2712–2719
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.330
- DOI:
- Cite (ACL):
- Ali Hakimi Parizi and Paul Cook. 2020. Evaluating Sub-word Embeddings in Cross-lingual Models. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2712–2719, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Evaluating Sub-word Embeddings in Cross-lingual Models (Hakimi Parizi & Cook, LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2020.lrec-1.330.pdf