Sivaiah Bellamkonda
2026
Abusive Content Detection in Telugu-English Code-Mixed Social Media Using Hybrid Transformer Architectures
Bojja Revanth Reddy | Sivaiah Bellamkonda
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bojja Revanth Reddy | Sivaiah Bellamkonda
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rapid growth of social media platforms has led to a substantial increase in user-generated content, including abusive and offensive language. Detecting abusive content becomes particularly challenging in low-resource and code-mixed language settings such as Telugu-English social media text. Code-mixed content involves transliteration, inconsistent spelling variations, informal expressions, and frequent language switching within a single sentence. This paper focuses on detecting abusive content in Telugu-English code-mixed comments using both traditional machine learning and transformer-based deep learning models. The proposed approach incorporates preprocessing strategies to normalize transliterations and spelling variations, hybrid feature extraction techniques combining TF-IDF and FastText embeddings, and fine-tuning of multilingual transformer models. The study addresses challenges such as morphological complexity, contextual ambiguity, and limited annotated data in low-resource NLP environments.