Active DOP: A constituency treebank annotation tool with online learning

Andreas van Cranenburgh


Abstract
We present a language-independent treebank annotation tool supporting rich annotations with discontinuous constituents and function tags. Candidate analyses are generated by an exemplar-based parsing model that immediately learns from each new annotated sentence during annotation. This makes it suitable for situations in which only a limited seed treebank is available, or a radically different domain is being annotated. The tool offers the possibility to experiment with and evaluate active learning methods to speed up annotation in a naturalistic setting, i.e., measuring actual annotation costs and tracking specific user interactions. The code is made available under the GNU GPL license at https://github.com/andreasvc/activedop.
Anthology ID:
C18-2009
Volume:
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Editor:
Dongyan Zhao
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–42
Language:
URL:
https://aclanthology.org/C18-2009
DOI:
Bibkey:
Cite (ACL):
Andreas van Cranenburgh. 2018. Active DOP: A constituency treebank annotation tool with online learning. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 38–42, Santa Fe, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Active DOP: A constituency treebank annotation tool with online learning (van Cranenburgh, COLING 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/C18-2009.pdf
Code
 andreasvc/activedop
Data
Penn Treebank