Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization

Meng Zhang, Yang Liu, Huanbo Luan, Yiqun Liu, Maosong Sun


Abstract
Being able to induce word translations from non-parallel data is often a prerequisite for cross-lingual processing in resource-scarce languages and domains. Previous endeavors typically simplify this task by imposing the one-to-one translation assumption, which is too strong to hold for natural languages. We remove this constraint by introducing the Earth Mover’s Distance into the training of bilingual word embeddings. In this way, we take advantage of its capability to handle multiple alternative word translations in a natural form of regularization. Our approach shows significant and consistent improvements across four language pairs. We also demonstrate that our approach is particularly preferable in resource-scarce settings as it only requires a minimal seed lexicon.
Anthology ID:
C16-1300
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
3188–3198
Language:
URL:
https://aclanthology.org/C16-1300
DOI:
Bibkey:
Cite (ACL):
Meng Zhang, Yang Liu, Huanbo Luan, Yiqun Liu, and Maosong Sun. 2016. Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3188–3198, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Inducing Bilingual Lexica From Non-Parallel Data With Earth Mover’s Distance Regularization (Zhang et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/C16-1300.pdf