Abstract
In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.- Anthology ID:
- W18-6006
- Volume:
- Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
- Month:
- November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Marie-Catherine de Marneffe, Teresa Lynn, Sebastian Schuster
- Venue:
- UDW
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 47–54
- Language:
- URL:
- https://aclanthology.org/W18-6006
- DOI:
- 10.18653/v1/W18-6006
- Cite (ACL):
- Kira Droganova, Filip Ginter, Jenna Kanerva, and Daniel Zeman. 2018. Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 47–54, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions (Droganova et al., UDW 2018)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/W18-6006.pdf
- Data
- Universal Dependencies