CUET_Agile@DravidianLangTech 2025: Fine-tuning Transformers for Detecting Abusive Text Targeting Women from Tamil and Malayalam Texts

Tareque Md Hanif, Md Rashadur Rahman


Abstract
As social media has grown, so has online abuse, with women often facing harmful online behavior. This discourages their free participation and expression online. This paper outlines the approach adopted by our team for detecting abusive comments in Tamil and Malayalam. The task focuses on classifying whether a given comment contains abusive language towards women. We experimented with transformer based models by fine-tuning Tamil-BERT for Tamil and Malayalam-BERT for Malayalam. Additionally, we fine-tuned IndicBERT v2 on both Tamil and Malayalam datasets. To evaluate the effect of pre-processing, we also conducted experiments using non-preprocessed text. Results demonstrate that IndicBERT v2 outperformed the language-specific BERT models in both languages. Pre-processing the data showed mixed results, with a slight improvement in the Tamil dataset but no significant benefit for the Malayalam dataset. Our approach secured first place in Tamil with a macro F1-score of 0.7883 and second place in Malayalam with a macro F1-score of 0.7234. The implementation details of the task will be found in the GitHub repository.
Anthology ID:
2025.dravidianlangtech-1.55
Volume:
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
May
Year:
2025
Address:
Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Elizabeth Sherly, Saranya Rajiakodi, Balasubramanian Palani, Malliga Subramanian, Subalalitha Cn, Dhivya Chinnappa
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
315–319
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.dravidianlangtech-1.55/
DOI:
Bibkey:
Cite (ACL):
Tareque Md Hanif and Md Rashadur Rahman. 2025. CUET_Agile@DravidianLangTech 2025: Fine-tuning Transformers for Detecting Abusive Text Targeting Women from Tamil and Malayalam Texts. In Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 315–319, Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
CUET_Agile@DravidianLangTech 2025: Fine-tuning Transformers for Detecting Abusive Text Targeting Women from Tamil and Malayalam Texts (Hanif & Rahman, DravidianLangTech 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.dravidianlangtech-1.55.pdf