TriVector@DravidianLangTech 2026: Abusive Tamil Text Detection on Social Media Using Lexicon-Augmented Transformers
Oarisa Rebayet, Tahmima Hoque Eid, Fawzia Tabassum, Hasan Murad
Abstract
Abusive comment detection in low-resource languages poses significant challenges, particularly when targeting gender-based abuse on social media platforms. This work presents our system for ’Abusive Tamil text targeting women on social media’ at DravidianLangTech@ACL 2026. We introduce nine handcrafted lexicon features integrated with pretrained multilingual transformer embeddings and evaluate their effectiveness in classifying Tamil online comments as abusive or non-abusive. To better understand their impact, we compare model performance with and without these lexical attributes across multiple transformer architectures. Our best-performing model, XLM-RoBERTa-Large, achieved a macro F1-score of 81.71%, securing 15th rank in the competition. The findings indicate that larger multilingual models generalize more effectively to unseen data compared to smaller domain-specific models, while the addition of lexical features yields only mild gains.- Anthology ID:
- 2026.dravidianlangtech-1.67
- Volume:
- Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
- Month:
- July
- Year:
- 2026
- Address:
- Underline (Virtual)
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 420–428
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.67/
- DOI:
- Cite (ACL):
- Oarisa Rebayet, Tahmima Hoque Eid, Fawzia Tabassum, and Hasan Murad. 2026. TriVector@DravidianLangTech 2026: Abusive Tamil Text Detection on Social Media Using Lexicon-Augmented Transformers. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 420–428, Underline (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- TriVector@DravidianLangTech 2026: Abusive Tamil Text Detection on Social Media Using Lexicon-Augmented Transformers (Rebayet et al., DravidianLangTech 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.67.pdf