KECS CODE CRAFTERS@DravidianLangTech 2026: Abusive Tamil Text Detection Targeting Women on Social Media

Nivetha, Nethrasri S, Malliga Subramanian


Abstract
As social media platforms continue to grow insize, unfortunately, they have also become ahub for digital toxicity, where women in linguistically diverse regions are particularly vulnerable to online harassment. Hence, the requirement for an automated moderation toolthat can effectively handle regional languagesis critical. Our paper is a step in this direction as we propose a classification modelfor the “Abusive Tamil Text Detection Targeting Women on Social Media” shared taskfor DravidianLangTech-2026. Our model istrained on a dataset of 25,948 comments fortraining and 915 for testing. Our primary objective was to classify content as either ”Abusive”or ”Non-Abusive” for YouTube videos. TheTamil language is particularly difficult to workwith owing to its highly agglutinative structure and the tendency for code-mixing betweenTamil and English or even using a mix of bothin a single sentence. To overcome these difficulties in preprocessing, we designed a specificpipeline for denoising these informal scripts.We then implemented four traditional machinelearning models: SVM, Logistic Regression,Random Forest, and Multinomial Naive Bayesusing TF-IDF for feature extraction. Our modelwas optimized for hyperparameters and decision thresholds to achieve an accuracy and F1score of 0.86 using Logistic Regression
Anthology ID:
2026.dravidianlangtech-1.43
Volume:
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
July
Year:
2026
Address:
Underline (Virtual)
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
284–288
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.43/
DOI:
Bibkey:
Cite (ACL):
Nivetha, Nethrasri S, and Malliga Subramanian. 2026. KEC’S CODE CRAFTERS@DravidianLangTech 2026: Abusive Tamil Text Detection Targeting Women on Social Media. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 284–288, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):
KEC’S CODE CRAFTERS@DravidianLangTech 2026: Abusive Tamil Text Detection Targeting Women on Social Media (Nivetha et al., DravidianLangTech 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.43.pdf