Burugu Rahul

2025

pdf bib abs
ANSR@DravidianLangTech 2025: Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media using RoBERTa and XGBoost
Nishanth S | Shruthi Rengarajan | S Ananthasivan | Burugu Rahul | Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Abusive language directed at women on social media, often characterized by crude slang, offensive terms, and profanity, is not just harmful communication but also acts as a tool for serious and widespread cyber violence. It is imperative that this pressing issue be addressed in order to establish safer online spaces and provide efficient methods for detecting and minimising this kind of abuse. However, the intentional masking of abusive language, especially in regional languages like Tamil and Malayalam, presents significant obstacles, making detection and prevention more difficult. The system created effectively identifies abusive sentences using supervised machine learning techniques based on RoBerta embeddings. The method aims to improve upon the current abusive language detection systems, which are essential for various online platforms, including social media and online gaming services. The proposed method currently ranked 8 in malayalam and 20 in tamil in terms of f1 score.

pdf bib abs
NSR_LT-EDI-2025 Automatic speech recognition in Tamil
Nishanth S | Shruthi Rengarajan | Burugu Rahul | Jyothish Lal G
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Automatic Speech Recognition (ASR) technology can potentially make marginalized communities more accessible. However, older adultsand transgender speakers are usually highly disadvantaged in accessing valuable services due to low digital literacy and social biases. In Tamil-speaking regions, these are further compounded by the inability of ASR models to address their unique speech types, accents, and spontaneous speaking styles. To bridge this gap, the LT-EDI-2025 shared task is designed to develop robust ASR systems for Tamil speech from vulnerable populations. Using whisper based models, this task is designed to improve recognition rates in speech data collected from older adults and transgender speakers in naturalistic settings such as banks, hospitals and public offices. By bridging the linguistic heterogeneity and acoustic variability among this underrepresented population, the shared task is designed to develop inclusive AI solutions that break communication barriers and empower vulnerable populations in Tamil Nadu.

Co-authors

Venues

Fix author