Multilingual Projection for Parsing Truly Low-Resource Languages

Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter, Anders Søgaard


Abstract
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.
Anthology ID:
Q16-1022
Volume:
Transactions of the Association for Computational Linguistics, Volume 4
Month:
Year:
2016
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
301–312
Language:
URL:
https://aclanthology.org/Q16-1022
DOI:
10.1162/tacl_a_00100
Bibkey:
Cite (ACL):
Željko Agić, Anders Johannsen, Barbara Plank, Héctor Martínez Alonso, Natalie Schluter, and Anders Søgaard. 2016. Multilingual Projection for Parsing Truly Low-Resource Languages. Transactions of the Association for Computational Linguistics, 4:301–312.
Cite (Informal):
Multilingual Projection for Parsing Truly Low-Resource Languages (Agić et al., TACL 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/Q16-1022.pdf