Abstract
We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. This method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.- Anthology ID:
- Q17-1020
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 5
- Month:
- Year:
- 2017
- Address:
- Cambridge, MA
- Editors:
- Lillian Lee, Mark Johnson, Kristina Toutanova
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 279–293
- Language:
- URL:
- https://aclanthology.org/Q17-1020
- DOI:
- 10.1162/tacl_a_00061
- Cite (ACL):
- Mohammad Sadegh Rasooli and Michael Collins. 2017. Cross-Lingual Syntactic Transfer with Limited Resources. Transactions of the Association for Computational Linguistics, 5:279–293.
- Cite (Informal):
- Cross-Lingual Syntactic Transfer with Limited Resources (Rasooli & Collins, TACL 2017)
- PDF:
- https://preview.aclanthology.org/naacl24-info/Q17-1020.pdf
- Code
- rasoolims/YaraParser