Bootstrapping UD treebanks for Delexicalized Parsing

Prasanth Kolachina, Aarne Ranta


Abstract
Standard approaches to treebanking traditionally employ a waterfall model (Sommerville, 2010), where annotation guidelines guide the annotation process and insights from the annotation process in turn lead to subsequent changes in the annotation guidelines. This process remains a very expensive step in creating linguistic resources for a target language, necessitates both linguistic expertise and manual effort to develop the annotations and is subject to inconsistencies in the annotation due to human errors. In this paper, we propose an alternative approach to treebanking—one that requires writing grammars. This approach is motivated specifically in the context of Universal Dependencies, an effort to develop uniform and cross-lingually consistent treebanks across multiple languages. We show here that a bootstrapping approach to treebanking via interlingual grammars is plausible and useful in a process where grammar engineering and treebanking are jointly pursued when creating resources for the target language. We demonstrate the usefulness of synthetic treebanks in the task of delexicalized parsing. Our experiments reveal that simple models for treebank generation are cheaper than human annotated treebanks, especially in the lower ends of the learning curves for delexicalized parsing, which is relevant in particular in the context of low-resource languages.
Anthology ID:
W19-6102
Volume:
Proceedings of the 22nd Nordic Conference on Computational Linguistics
Month:
September–October
Year:
2019
Address:
Turku, Finland
Editors:
Mareike Hartmann, Barbara Plank
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press
Note:
Pages:
15–24
Language:
URL:
https://aclanthology.org/W19-6102
DOI:
Bibkey:
Cite (ACL):
Prasanth Kolachina and Aarne Ranta. 2019. Bootstrapping UD treebanks for Delexicalized Parsing. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 15–24, Turku, Finland. Linköping University Electronic Press.
Cite (Informal):
Bootstrapping UD treebanks for Delexicalized Parsing (Kolachina & Ranta, NoDaLiDa 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/W19-6102.pdf