Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction
Silvia Severini, Viktor Hangya, Alexander Fraser, Hinrich Schütze
Abstract
Bilingual dictionary induction (BDI) is the task of accurately translating words to the target language. It is of great importance in many low-resource scenarios where cross-lingual training data is not available. To perform BDI, bilingual word embeddings (BWEs) are often used due to their low bilingual training signal requirements. They achieve high performance, but problematic cases still remain, such as the translation of rare words or named entities, which often need to be transliterated. In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs). BOEs represent source and target language transliteration word pairs with similar vectors. A key problem in our BDI setup is to decide which information source – BWEs (or semantics) vs. BOEs (or orthography) – is more reliable for a particular word pair. We propose a novel classification-based BDI system that uses BWEs, BOEs and a number of other features to make this decision. We test our system on English-Russian BDI and show improved performance. In addition, we show the effectiveness of our BOEs by successfully using them for transliteration mining based on cosine similarity.- Anthology ID:
- 2020.coling-main.531
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 6044–6055
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.531
- DOI:
- 10.18653/v1/2020.coling-main.531
- Cite (ACL):
- Silvia Severini, Viktor Hangya, Alexander Fraser, and Hinrich Schütze. 2020. Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6044–6055, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction (Severini et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.coling-main.531.pdf