UzUDT: Uzbek Universal Dependencies Treebank

Sanatbek Gayratovich Matlatipov, Mersaid Aripov


Abstract
In this paper, we present a new Universal Dependencies treebank for Uzbek language(UzUDT) developed as a gold-standard resource with full manual annotation. The treebank includes 684 sentences (7,582 tokens) from Uzbek literary texts, and is larger and more domain-diverse than the existing Uzbek UD treebank. The corpus was developed through rigorous multi-annotator adjudication, achieving very high inter-annotator agreement (multi-rater agreement coefficients >0.90) across lemmatization, PoS tagging, and morphological features. Alongside comprehensive corpus profiling, we establish robust computational baselines by evaluating graph-based (Stanza) and transition-based (spaCy) parsing architectures using both static and monolingual contextual embeddings. Our evaluations reveal a critical architectural trade-off for low-resource agglutinative parsing: joint transition-based models excel at morphosyntactic tagging, whereas graph-based models remain strictly superior for resolving complex structural dependencies. Furthermore, we demonstrate that cross-treebank data augmentation yields substantial, synergistic accuracy gains. The resource provides a much-needed high-quality treebank for Uzbek to assist in developing better NLP tools and to enable linguistic research in the low-resource language
Anthology ID:
2026.lrec-main.912
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11642–11649
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.912/
DOI:
Bibkey:
Cite (ACL):
Sanatbek Gayratovich Matlatipov and Mersaid Aripov. 2026. UzUDT: Uzbek Universal Dependencies Treebank. International Conference on Language Resources and Evaluation, main:11642–11649.
Cite (Informal):
UzUDT: Uzbek Universal Dependencies Treebank (Matlatipov & Aripov, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.912.pdf