Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data

Yurii Shynkarov, Veronika Solopova, Vera Schmitt


Abstract
This paper addresses the challenges of sentiment analysis in Ukrainian social media, where users frequently engage in code-switching with Russian and other languages. We introduce COSMUS (COde-Switched MUltilingual Sentiment for Ukrainian Social media), a 12,224-texts corpus collected from Telegram channels, product‐review sites and open datasets, annotated into positive, negative, neutral and mixed sentiment classes as well as language labels (Ukrainian, Russian, code-switched). We benchmark three modeling paradigms: (i) few‐shot prompting of GPT‐4o and DeepSeek V2-chat, (ii) multilingual mBERT, and (iii) the Ukrainian‐centric UkrRoberta. We also analyze calibration and LIME scores of the latter two solutions to verify its performance on various language labels. To mitigate data sparsity we test two augmentation strategies: back‐translation consistently hurts performance, whereas a Large Language Model (LLM) word‐substitution scheme yields up to +2.2% accuracy. Our work delivers the first publicly available dataset and comprehensive benchmark for sentiment classification in Ukrainian code‐switching media. Results demonstrate that language‐specific pre‐training combined with targeted augmentation yields the most accurate and trustworthy predictions in this challenging low‐resource setting.
Anthology ID:
2025.unlp-1.18
Volume:
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (online)
Editor:
Mariana Romanyshyn
Venues:
UNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
179–193
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.18/
DOI:
Bibkey:
Cite (ACL):
Yurii Shynkarov, Veronika Solopova, and Vera Schmitt. 2025. Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 179–193, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):
Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data (Shynkarov et al., UNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.18.pdf