Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

Hanan Aldarmaki, Mahesh Mohan, Mona Diab


Abstract
Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector spaces to align them such that similar words are mapped to each other. We show empirically that the performance of bilingual correspondents that are learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.
Anthology ID:
Q18-1014
Volume:
Transactions of the Association for Computational Linguistics, Volume 6
Month:
Year:
2018
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
185–196
Language:
URL:
https://aclanthology.org/Q18-1014
DOI:
10.1162/tacl_a_00014
Bibkey:
Cite (ACL):
Hanan Aldarmaki, Mahesh Mohan, and Mona Diab. 2018. Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings. Transactions of the Association for Computational Linguistics, 6:185–196.
Cite (Informal):
Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings (Aldarmaki et al., TACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/Q18-1014.pdf
Data
WMT 2014