Language classification from bilingual word embedding graphs

Steffen Eger, Armin Hoenen, Alexander Mehler


Abstract
We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic spaces vary in meaningful ways across second languages. Our results support the hypothesis that semantic language similarity is influenced by both structural similarity as well as geography/contact.
Anthology ID:
C16-1331
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
3507–3518
Language:
URL:
https://aclanthology.org/C16-1331
DOI:
Bibkey:
Cite (ACL):
Steffen Eger, Armin Hoenen, and Alexander Mehler. 2016. Language classification from bilingual word embedding graphs. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3507–3518, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Language classification from bilingual word embedding graphs (Eger et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/C16-1331.pdf