MSM_CUET@DravidianLangTech 2025: XLM-BERT and MuRIL Based Transformer Models for Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media

Md Mizanur Rahman; Srijita Dhar; Md Mehedi Hasan; Hasan Murad

doi:10.18653/v1/2025.dravidianlangtech-1.42

MSM_CUET@DravidianLangTech 2025: XLM-BERT and MuRIL Based Transformer Models for Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media

Md Mizanur Rahman, Srijita Dhar, Md Mehedi Hasan, Hasan Murad

Abstract

Social media has evolved into an excellent platform for presenting ideas, viewpoints, and experiences in modern society. But this large domain has also brought some alarming problems including internet misuse. Targeted specifically at certain groups like women, abusive language is pervasive on social media. The task is always difficult to detect abusive text for low-resource languages like Tamil, Malayalam, and other Dravidian languages. It is crucial to address this issue seriously, especially for Dravidian languages. This paper presents a novel approach to detecting abusive Tamil and Malayalam texts targeting social media. A shared task on Abusive Tamil and Malayalam Text Targeting Women on Social Media Detection has been organized by DravidianLangTech at NAACL-2025. The organizer has provided an annotated dataset that labels two classes: Abusive and Non-Abusive. We have implemented our model with different transformer-based models like XLM-R, MuRIL, IndicBERT, and mBERT transformers and the Ensemble method with SVM and Random Forest for training. We selected XLM-RoBERT for Tamil text and MuRIL for Malayalam text due to their superior performance compared to other models. After developing our model, we tested and evaluated it on the DravidianLangTech@NAACL 2025 shared task dataset. We found that XLM-R has provided the best result for abusive Tamil text detections with an F1 score of 0.7873 on the test set and ranked 2nd position among all participants. On the other hand, MuRIL has provided the best result for abusive Malayalam text detections with an F1 score of 0.6812 and ranked 10th among all participants.

Anthology ID:: 2025.dravidianlangtech-1.42
Volume:: Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: May
Year:: 2025
Address:: Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Saranya Rajiakodi, Balasubramanian Palani, Malliga Subramanian, Subalalitha Cn, Dhivya Chinnappa
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 243–247
Language:
URL:: https://preview.aclanthology.org/moar-dois/2025.dravidianlangtech-1.42/
DOI:: 10.18653/v1/2025.dravidianlangtech-1.42
Bibkey:
Cite (ACL):: Md Mizanur Rahman, Srijita Dhar, Md Mehedi Hasan, and Hasan Murad. 2025. MSM_CUET@DravidianLangTech 2025: XLM-BERT and MuRIL Based Transformer Models for Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media. In Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 243–247, Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: MSM_CUET@DravidianLangTech 2025: XLM-BERT and MuRIL Based Transformer Models for Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media (Rahman et al., DravidianLangTech 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/moar-dois/2025.dravidianlangtech-1.42.pdf

PDF Cite Search Fix data