Cross-Lingual Syntactically Informed Distributed Word Representations

Ivan Vulić


Abstract
We develop a novel cross-lingual word representation model which injects syntactic information through dependency-based contexts into a shared cross-lingual word vector space. The model, termed CL-DepEmb, is based on the following assumptions: (1) dependency relations are largely language-independent, at least for related languages and prominent dependency links such as direct objects, as evidenced by the Universal Dependencies project; (2) word translation equivalents take similar grammatical roles in a sentence and are therefore substitutable within their syntactic contexts. Experiments with several language pairs on word similarity and bilingual lexicon induction, two fundamental semantic tasks emphasising semantic similarity, suggest the usefulness of the proposed syntactically informed cross-lingual word vector spaces. Improvements are observed in both tasks over standard cross-lingual “offline mapping” baselines trained using the same setup and an equal level of bilingual supervision.
Anthology ID:
E17-2065
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
408–414
Language:
URL:
https://aclanthology.org/E17-2065
DOI:
Bibkey:
Cite (ACL):
Ivan Vulić. 2017. Cross-Lingual Syntactically Informed Distributed Word Representations. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 408–414, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Cross-Lingual Syntactically Informed Distributed Word Representations (Vulić, EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/E17-2065.pdf