Exploring Hate Speech Detection Models for Lithuanian Language

Justina Mandravickaitė, Eglė Rimkienė, Mindaugas Petkevičius, Milita Songailaitė, Eimantas Zaranka, Tomas Krilavičius


Abstract
Online hate speech poses a significant challenge, as it can incite violence and contribute to social polarization. This study evaluates traditional machine learning, deep learning and large language models (LLMs) for Lithuanian hate speech detection, addressing class imbalance issue via data augmentation and resampling techniques. Our dataset included 27,358 user-generated comments, annotated into Neutral language (56%), Offensive language (29%) and Hate speech (15%). We trained BiLSTM, LSTM, CNN, SVM, and Random Forest models and fine-tuned Multilingual BERT, LitLat BERT, Electra, RWKV, ChatGPT, LT-Llama-2, and Gemma-2 models. Additionally, we pre-trained Electra for Lithuanian. Models were evaluated using accuracy and weighted F1-score. On the imbalanced dataset, LitLat BERT (0.76 weighted F1-score) and Multilingual BERT (0.73 weighted F1-score) performed best. Over-sampling further boosted weighted F1-scores, with Multilingual BERT (0.85) and LitLat BERT (0.84) outperforming other models. Over-sampling combined with augmentation provided the best overall results. Under-sampling led to performance declines and was less effective. Finally, fine-tuning LLMs improved their accuracy which highlighted the importance of fine-tuning for more specialized NLP tasks.
Anthology ID:
2025.woah-1.18
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
206–218
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.woah-1.18/
DOI:
Bibkey:
Cite (ACL):
Justina Mandravickaitė, Eglė Rimkienė, Mindaugas Petkevičius, Milita Songailaitė, Eimantas Zaranka, and Tomas Krilavičius. 2025. Exploring Hate Speech Detection Models for Lithuanian Language. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 206–218, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Exploring Hate Speech Detection Models for Lithuanian Language (Mandravickaitė et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.woah-1.18.pdf
Supplementarymaterial:
 2025.woah-1.18.SupplementaryMaterial.zip