Universal Dependencies Treebank for Khoekhoe (KDT)

Tulchynska Kira, Job Sylvanus, Witzlack-Makarevich Alena


Abstract
This paper reports on the development of the first dependency treebank for Khoekhoe (KDT). Khoekhoe (Khoe-Kwadi, Namibia) is a low-resource language with few linguistic and computational resources available publicly. This treebank consists of 29k words across six texts taken from various registers. It includes a substantial portion of spoken conversational data. These sentences were annotated manually according to the Universal Dependencies framework. In this paper, apart from presenting the strategies that have been followed to create the treebank, we also discussed some challenging morphological features and syntactic constructions found in the corpus and outlined how we have handled them using the current Universal Dependencies specification.
Anthology ID:
2025.udw-1.12
Volume:
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Gosse Bomma, Çağrı Çöltekin
Venues:
UDW | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–128
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.12/
DOI:
Bibkey:
Cite (ACL):
Tulchynska Kira, Job Sylvanus, and Witzlack-Makarevich Alena. 2025. Universal Dependencies Treebank for Khoekhoe (KDT). In Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), pages 119–128, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
Universal Dependencies Treebank for Khoekhoe (KDT) (Kira et al., UDW-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.12.pdf