BBPOS: BERT-based Part-of-Speech Tagging for Uzbek
Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer, Sophie Fellenz
Abstract
This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.- Anthology ID:
- 2025.loreslm-1.23
- Volume:
- Proceedings of the First Workshop on Language Models for Low-Resource Languages
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
- Venues:
- LoResLM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 287–293
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.loreslm-1.23/
- DOI:
- Cite (ACL):
- Latofat Bobojonova, Arofat Akhundjanova, Phil Sidney Ostheimer, and Sophie Fellenz. 2025. BBPOS: BERT-based Part-of-Speech Tagging for Uzbek. In Proceedings of the First Workshop on Language Models for Low-Resource Languages, pages 287–293, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- BBPOS: BERT-based Part-of-Speech Tagging for Uzbek (Bobojonova et al., LoResLM 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.loreslm-1.23.pdf