Abstract
This article describes a collection of sentences in K’iche’ annotated for morphology and syntax. K’iche’ is a language in the Mayan language family, spoken in Guatemala. The annotation is done according to the guidelines of the Universal Dependencies project. The corpus consists of a total of 1,433 sentences containing approximately 10,000 tokens and is released under a free/open-source licence. We present a comparison of parsing systems for K’iche’ using this corpus and describe how it can be used for mining linguistic examples.- Anthology ID:
- 2021.americasnlp-1.2
- Volume:
- Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Venue:
- AmericasNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10–20
- Language:
- URL:
- https://aclanthology.org/2021.americasnlp-1.2
- DOI:
- 10.18653/v1/2021.americasnlp-1.2
- Cite (ACL):
- Francis Tyers and Robert Henderson. 2021. A corpus of K’iche’ annotated for morphosyntactic structure. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 10–20, Online. Association for Computational Linguistics.
- Cite (Informal):
- A corpus of K’iche’ annotated for morphosyntactic structure (Tyers & Henderson, AmericasNLP 2021)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2021.americasnlp-1.2.pdf