Bojja Revanth Reddy
2026
Abusive Content Detection in Telugu-English Code-Mixed Social Media Using Hybrid Transformer Architectures
Bojja Revanth Reddy | Sivaiah Bellamkonda
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Bojja Revanth Reddy | Sivaiah Bellamkonda
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rapid growth of social media platforms has led to a substantial increase in user-generated content, including abusive and offensive language. Detecting abusive content becomes particularly challenging in low-resource and code-mixed language settings such as Telugu-English social media text. Code-mixed content involves transliteration, inconsistent spelling variations, informal expressions, and frequent language switching within a single sentence. This paper focuses on detecting abusive content in Telugu-English code-mixed comments using both traditional machine learning and transformer-based deep learning models. The proposed approach incorporates preprocessing strategies to normalize transliterations and spelling variations, hybrid feature extraction techniques combining TF-IDF and FastText embeddings, and fine-tuning of multilingual transformer models. The study addresses challenges such as morphological complexity, contextual ambiguity, and limited annotated data in low-resource NLP environments.
2025
CoreFour_IIITK@DravidianLangTech 2025: Abusive Content Detection Against Women Using Machine Learning And Deep Learning Models
Varun Balaji S | Bojja Revanth Reddy | Vyshnavi Reddy Battula | Suraj Nagunuri | Balasubramanian Palani
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Varun Balaji S | Bojja Revanth Reddy | Vyshnavi Reddy Battula | Suraj Nagunuri | Balasubramanian Palani
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rise in utilizing social media platforms increased user-generated content significantly, including negative comments about women in Tamil and Malayalam. While these platforms encourage communication and engagement, they also become a medium for the spread of abusive language, which poses challenges to maintaining a safe online environment for women. Prevention of usage of abusive content against women as much as possible is the main issue focused in the research. This research focuses on detecting abusive language against women in Tamil and Malayalam social media comments using computational models, such as Logistic regression model, Support vector machines (SVM) model, Random forest model, multilingual BERT model, XLM-Roberta model, and IndicBERT. These models were trained and tested on a specifically curated dataset containing labeled comments in both languages. Among all the approaches, IndicBERT achieved a highest macro F1-score of 0.75. The findings emphasize the significance of employing a combination of traditional and advanced computational techniques to address challenges in Abusive Content Detection (ACD) specific to regional languages.