A Universal Dependencies Conversion Pipeline for a Penn-format Constituency Treebank
Þórunn Arnardóttir, Hinrik Hafsteinsson, Einar Freyr Sigurðsson, Kristín Bjarnadóttir, Anton Karl Ingason, Hildur Jónsdóttir, Steinþór Steingrímsson
Abstract
The topic of this paper is a rule-based pipeline for converting constituency treebanks based on the Penn Treebank format to Universal Dependencies (UD). We describe an Icelandic constituency treebank, its annotation scheme and the UD scheme. The conversion is discussed, the methods used to deliver a fully automated UD corpus and complications involved. To show its applicability to corpora in different languages, we extend the pipeline and convert a Faroese constituency treebank to a UD corpus. The result is an open-source conversion tool, published under an Apache 2.0 license, applicable to a Penn-style treebank for conversion to a UD corpus, along with the two new UD corpora.- Anthology ID:
- 2020.udw-1.3
- Volume:
- Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Marie-Catherine de Marneffe, Miryam de Lhoneux, Joakim Nivre, Sebastian Schuster
- Venue:
- UDW
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16–25
- Language:
- URL:
- https://aclanthology.org/2020.udw-1.3
- DOI:
- Cite (ACL):
- Þórunn Arnardóttir, Hinrik Hafsteinsson, Einar Freyr Sigurðsson, Kristín Bjarnadóttir, Anton Karl Ingason, Hildur Jónsdóttir, and Steinþór Steingrímsson. 2020. A Universal Dependencies Conversion Pipeline for a Penn-format Constituency Treebank. In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), pages 16–25, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- A Universal Dependencies Conversion Pipeline for a Penn-format Constituency Treebank (Arnardóttir et al., UDW 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.udw-1.3.pdf
- Data
- Universal Dependencies