DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media
Diya Prakash, Praveen Kumar S, R Ranjith Kumar, Balasubramanian Palani, Jobin Jose, Siranjeevi Rajamanickam
Abstract
The fast-growing number of content in Tamil in social media has led to increasing abusive and gender-directed hate speech in online platforms. Detecting abusive content written in Tamil is relatively difficult owing to the complex morphological structure of Tamil language, its dialects, transliteration, and contextualized usage. In this study, the use of transformer-based pretrained language models in detecting abusive content in Tamil was explored. Five transformer-based models—mBERT, MuRIL, XLM-RoBERTa, IndicBERT, and Tamil-BERT—were fine-tuned and tested using DravidianLangTech 2026 shared task dataset. The experimental results show that the best-performing model was Tamil-BERT with an accuracy rate of 80.72% owing to Tamil-specific pretraining and better morphological analysis capabilities. Our system ranks 5th at the leaderboard of the DravidianLangTech 2026 shared task challenge. The source code and fine-tuned models are opensource and publicly accessible.- Anthology ID:
- 2026.dravidianlangtech-1.35
- Volume:
- Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
- Month:
- July
- Year:
- 2026
- Address:
- Underline (Virtual)
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 242–247
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.35/
- DOI:
- Cite (ACL):
- Diya Prakash, Praveen Kumar S, R Ranjith Kumar, Balasubramanian Palani, Jobin Jose, and Siranjeevi Rajamanickam. 2026. DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 242–247, Underline (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- DPR@DravidianLangTech 2026: Transformer-Based Abusive Content Detection for Tamil Text Targeting Women on Social Media (Prakash et al., DravidianLangTech 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.35.pdf