Svitlana Galeshchuk
2026
Toward a Gold-Standard Benchmark for Evaluating Ukrainian Language Proficiency in LLMs
Svitlana Galeshchuk | Yuliia Maksymiuk | Yuliia Chernobrov | Nina Stankevych | Oleksandra Antoniv | Nataliia Faryna | Oksana Popkova
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Svitlana Galeshchuk | Yuliia Maksymiuk | Yuliia Chernobrov | Nina Stankevych | Oleksandra Antoniv | Nataliia Faryna | Oksana Popkova
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
The paper presents an expert-curated benchmark for assessing Ukrainian proficiency in LLMs, focusing on grammar and orthography as core components of language competence. Prepared by professional linguists, the proposed gold-standard dataset is designed to test normative Ukrainian usage.The benchmark is further used to evaluate a range of LLMs, including Ukrainian-focused, multilingual, and large-scale models, under zero-shot and few-shot prompting in Ukrainian and English. Across these settings, smaller models achieve no more than 42.1% accuracy, while large-scale LLMs reach up to 59.6%. These results show that standard Ukrainian remains challenging for current LLMs and highlight the need for stronger language-specific evaluation and adaptation.
2024
Entity Embellishment Mitigation in LLMs Output with Noisy Synthetic Dataset for Alignment
Svitlana Galeshchuk
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Svitlana Galeshchuk
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
The present work focuses on the entity embellishments when named entities are accompanied by additional information that is not supported by the context or the source material. Our paper contributes into mitigating this problem in large language model’s generated texts, summaries in particular, by proposing the approach with synthetic noise injection in the generated samples that are further used for alignment of finetuned LLM. We also challenge the issue of solutions scarcity for low-resourced languages and test our approach with corpora in Ukrainian.
2023
Abstractive Summarization for the Ukrainian Language: Multi-Task Learning with Hromadske.ua News Dataset
Svitlana Galeshchuk
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Svitlana Galeshchuk
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Despite recent NLP developments, abstractive summarization remains a challenging task, especially in the case of low-resource languages like Ukrainian. The paper aims at improving the quality of summaries produced by mT5 for news in Ukrainian by fine-tuning the model with a mixture of summarization and text similarity tasks using summary-article and title-article training pairs, respectively. The proposed training set-up with small, base, and large mT5 models produce higher quality résumé. Besides, we present a new Ukrainian dataset for the abstractive summarization task that consists of circa 36.5K articles collected from Hromadske.ua until June 2021.
2019
Sentiment Analysis for Multilingual Corpora
Svitlana Galeshchuk | Ju Qiu | Julien Jourdan
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Svitlana Galeshchuk | Ju Qiu | Julien Jourdan
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
The paper presents a generic approach to the supervised sentiment analysis of social media content in Slavic languages. The method proposes translating the documents from the original language to English with Google’s Neural Translation Model. The resulted texts are then converted to vectors by averaging the vectorial representation of words derived from a pre-trained Word2Vec English model. Testing the approach with several machine learning methods on Polish, Slovenian and Croatian Twitter datasets returns up to 86% of classification accuracy on the out-of-sample data.