DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media

Diya Prakash; Praveen Kumar S; R Ranjith Kumar; Balasubramanian Palani; Jobin Jose; Siranjeevi Rajamanickam

DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media

Diya Prakash, Praveen Kumar S, R Ranjith Kumar, Balasubramanian Palani, Jobin Jose, Siranjeevi Rajamanickam

Abstract

The fast-growing number of content in Tamil in social media has led to increasing abusive and gender-directed hate speech in online platforms. Detecting abusive content written in Tamil is relatively difficult owing to the complex morphological structure of Tamil language, its dialects, transliteration, and contextualized usage. In this study, the use of transformer-based pretrained language models in detecting abusive content in Tamil was explored. Five transformer-based models—mBERT, MuRIL, XLM-RoBERTa, IndicBERT, and Tamil-BERT—were fine-tuned and tested using DravidianLangTech 2026 shared task dataset. The experimental results show that the best-performing model was Tamil-BERT with an accuracy rate of 80.72% owing to Tamil-specific pretraining and better morphological analysis capabilities. Our system ranks 5th at the leaderboard of the DravidianLangTech 2026 shared task challenge. The source code and fine-tuned models are opensource and publicly accessible.

Anthology ID:: 2026.dravidianlangtech-1.35
Volume:: Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: July
Year:: 2026
Address:: Underline (Virtual)
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 242–247
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.35/
DOI:
Bibkey:
Cite (ACL):: Diya Prakash, Praveen Kumar S, R Ranjith Kumar, Balasubramanian Palani, Jobin Jose, and Siranjeevi Rajamanickam. 2026. DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 242–247, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):: DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media (Prakash et al., DravidianLangTech 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.35.pdf

PDF Cite Search Fix data