Exploring Hate Speech Detection Models for Lithuanian Language
Justina Mandravickaitė, Eglė Rimkienė, Mindaugas Petkevičius, Milita Songailaitė, Eimantas Zaranka, Tomas Krilavičius
Abstract
Online hate speech poses a significant challenge, as it can incite violence and contribute to social polarization. This study evaluates traditional machine learning, deep learning and large language models (LLMs) for Lithuanian hate speech detection, addressing class imbalance issue via data augmentation and resampling techniques. Our dataset included 27,358 user-generated comments, annotated into Neutral language (56%), Offensive language (29%) and Hate speech (15%). We trained BiLSTM, LSTM, CNN, SVM, and Random Forest models and fine-tuned Multilingual BERT, LitLat BERT, Electra, RWKV, ChatGPT, LT-Llama-2, and Gemma-2 models. Additionally, we pre-trained Electra for Lithuanian. Models were evaluated using accuracy and weighted F1-score. On the imbalanced dataset, LitLat BERT (0.76 weighted F1-score) and Multilingual BERT (0.73 weighted F1-score) performed best. Over-sampling further boosted weighted F1-scores, with Multilingual BERT (0.85) and LitLat BERT (0.84) outperforming other models. Over-sampling combined with augmentation provided the best overall results. Under-sampling led to performance declines and was less effective. Finally, fine-tuning LLMs improved their accuracy which highlighted the importance of fine-tuning for more specialized NLP tasks.- Anthology ID:
- 2025.woah-1.18
- Volume:
- Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
- Venues:
- WOAH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 206–218
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.woah-1.18/
- DOI:
- Cite (ACL):
- Justina Mandravickaitė, Eglė Rimkienė, Mindaugas Petkevičius, Milita Songailaitė, Eimantas Zaranka, and Tomas Krilavičius. 2025. Exploring Hate Speech Detection Models for Lithuanian Language. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 206–218, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Exploring Hate Speech Detection Models for Lithuanian Language (Mandravickaitė et al., WOAH 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.woah-1.18.pdf