Abusive Content Detection in Telugu-English Code-Mixed Social Media Using Hybrid Transformer Architectures

Bojja Revanth Reddy, Sivaiah Bellamkonda


Abstract
The rapid growth of social media platforms has led to a substantial increase in user-generated content, including abusive and offensive language. Detecting abusive content becomes particularly challenging in low-resource and code-mixed language settings such as Telugu-English social media text. Code-mixed content involves transliteration, inconsistent spelling variations, informal expressions, and frequent language switching within a single sentence. This paper focuses on detecting abusive content in Telugu-English code-mixed comments using both traditional machine learning and transformer-based deep learning models. The proposed approach incorporates preprocessing strategies to normalize transliterations and spelling variations, hybrid feature extraction techniques combining TF-IDF and FastText embeddings, and fine-tuning of multilingual transformer models. The study addresses challenges such as morphological complexity, contextual ambiguity, and limited annotated data in low-resource NLP environments.
Anthology ID:
2026.dravidianlangtech-1.1
Volume:
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
July
Year:
2026
Address:
Underline (Virtual)
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–5
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.1/
DOI:
Bibkey:
Cite (ACL):
Bojja Revanth Reddy and Sivaiah Bellamkonda. 2026. Abusive Content Detection in Telugu-English Code-Mixed Social Media Using Hybrid Transformer Architectures. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 1–5, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):
Abusive Content Detection in Telugu-English Code-Mixed Social Media Using Hybrid Transformer Architectures (Reddy & Bellamkonda, DravidianLangTech 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.1.pdf