Oleksandra Antoniv

2026

Toward a Gold-Standard Benchmark for Evaluating Ukrainian Language Proficiency in LLMs
Svitlana Galeshchuk | Yuliia Maksymiuk | Yuliia Chernobrov | Nina Stankevych | Oleksandra Antoniv | Nataliia Faryna | Oksana Popkova
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)

The paper presents an expert-curated benchmark for assessing Ukrainian proficiency in LLMs, focusing on grammar and orthography as core components of language competence. Prepared by professional linguists, the proposed gold-standard dataset is designed to test normative Ukrainian usage.The benchmark is further used to evaluate a range of LLMs, including Ukrainian-focused, multilingual, and large-scale models, under zero-shot and few-shot prompting in Ukrainian and English. Across these settings, smaller models achieve no more than 42.1% accuracy, while large-scale LLMs reach up to 59.6%. These results show that standard Ukrainian remains challenging for current LLMs and highlight the need for stronger language-specific evaluation and adaptation.

Co-authors

Nina Stankevych 1

Venues

UNLP1

Fix author