Abusive Speech Detection in Serbian using Machine Learning

Danka Jokić, Ranka Stanković, Branislava Šandrih Todorović


Abstract
The increase in the use of abusive language on social media and virtual platforms has emphasized the importance of developing efficient hate speech detection systems. While there have been considerable advancements in creating such systems for the English language, resources are scarce for other languages, such as Serbian. This research paper explores the use of machine learning and deep learning techniques to identify abusive language in Serbian text. The authors used AbCoSER, a dataset of Serbian tweets that have been labeled as abusive or non-abusive. They evaluated various algorithms to classify tweets, and the best-performing model is based on the deep learning transformer architecture. The model attained an F1 macro score of 0.827, a figure that is commensurate with the benchmarks established for offensive speech datasets of a similar magnitude in other languages.
Anthology ID:
2024.nlpaics-1.18
Volume:
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Month:
July
Year:
2024
Address:
Lancaster, UK
Editors:
Ruslan Mitkov, Saad Ezzini, Tharindu Ranasinghe, Ignatius Ezeani, Nouran Khallaf, Cengiz Acarturk, Matthew Bradbury, Mo El-Haj, Paul Rayson
Venue:
NLPAICS
SIG:
Publisher:
International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security
Note:
Pages:
153–163
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.18/
DOI:
Bibkey:
Cite (ACL):
Danka Jokić, Ranka Stanković, and Branislava Šandrih Todorović. 2024. Abusive Speech Detection in Serbian using Machine Learning. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, pages 153–163, Lancaster, UK. International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security.
Cite (Informal):
Abusive Speech Detection in Serbian using Machine Learning (Jokić et al., NLPAICS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.nlpaics-1.18.pdf