Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages

Garrett Nicolai, David Yarowsky


Abstract
A large percentage of computational tools are concentrated in a very small subset of the planet’s languages. Compounding the issue, many languages lack the high-quality linguistic annotation necessary for the construction of such tools with current machine learning methods. In this paper, we address both issues simultaneously: leveraging the high accuracy of English taggers and parsers, we project morphological information onto translations of the Bible in 26 varied test languages. Using an iterative discovery, constraint, and training process, we build inflectional lexica in the target languages. Through a combination of iteration, ensembling, and reranking, we see double-digit relative error reductions in lemmatization and morphological analysis over a strong initial system.
Anthology ID:
P19-1172
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1765–1774
Language:
URL:
https://aclanthology.org/P19-1172
DOI:
10.18653/v1/P19-1172
Bibkey:
Cite (ACL):
Garrett Nicolai and David Yarowsky. 2019. Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1765–1774, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages (Nicolai & Yarowsky, ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/P19-1172.pdf
Supplementary:
 P19-1172.Supplementary.pdf
Video:
 https://vimeo.com/384495837