Abstract
This paper presents a new approach to the problem of cross-lingual dependency parsing, aiming at leveraging training data from different source languages to learn a parser in a target language. Specifically, this approach first constructs word vector representations that exploit structural (i.e., dependency-based) contexts but only considering the morpho-syntactic information associated with each word and its contexts. These delexicalized word embeddings, which can be trained on any set of languages and capture features shared across languages, are then used in combination with standard language-specific features to train a lexicalized parser in the target language. We evaluate our approach through experiments on a set of eight different languages that are part the Universal Dependencies Project. Our main results show that using such delexicalized embeddings, either trained in a monolingual or multilingual fashion, achieves significant improvements over monolingual baselines.- Anthology ID:
- E17-1023
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 241–250
- Language:
- URL:
- https://aclanthology.org/E17-1023
- DOI:
- Cite (ACL):
- Mathieu Dehouck and Pascal Denis. 2017. Delexicalized Word Embeddings for Cross-lingual Dependency Parsing. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 241–250, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Delexicalized Word Embeddings for Cross-lingual Dependency Parsing (Dehouck & Denis, EACL 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/E17-1023.pdf
- Data
- Universal Dependencies