Universal Dependencies Treebank for Uzbek

Arofat Akhundjanova, Luigi Talamo


Abstract
We present the first Universal Dependencies treebank for Uzbek, a low-resource language from the Turkic family. The treebank contains 500 sentences (5850 tokens) sourced from the news and fiction genres and it is annotated for lemmas, part-of-speech (POS) tags, morphological features, and dependency relations. We describe our methodology for building the treebank, which consists of a mix of manual and automatic annotation and discuss some constructions of the Uzbek language that pose challenges to the UD framework.
Anthology ID:
2025.resourceful-1.1
Volume:
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:
March
Year:
2025
Address:
Tallinn, Estonia
Editors:
Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:
RESOURCEFUL | WS
SIG:
Publisher:
University of Tartu Library, Estonia
Note:
Pages:
1–6
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.1/
DOI:
Bibkey:
Cite (ACL):
Arofat Akhundjanova and Luigi Talamo. 2025. Universal Dependencies Treebank for Uzbek. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 1–6, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):
Universal Dependencies Treebank for Uzbek (Akhundjanova & Talamo, RESOURCEFUL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.1.pdf