Praveenkumar C

2025

pdf bib abs
KEC_TECH_TITANS@DravidianLangTech 2025: Abusive Text Detection in Tamil and Malayalam Social Media Comments Using Machine Learning
Malliga Subramanian | Kogilavani Shanmugavadivel | Deepiga P | Dharshini S | Ananthakumar S | Praveenkumar C
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Social media platforms have become a breeding ground for hostility and toxicity, with abusive language targeting women becoming a pervasive issue. This paper addresses the detection of abusive content in Tamil and Malayalam social media comments using machine learning models. We experimented with GRU, LSTM, Bidirectional LSTM, CNN, FastText, and XGBoost models, evaluating their performance on a code-mixed dataset of Tamil and Malayalam comments collected from YouTube. Our findings demonstrate that FastText and CNN models yielded the best performance among the evaluated classifiers, achieving F1-scores of 0.73 each. This study contributes to the ongoing research on abusive text detection for under-resourced languages and highlights the need for robust, scalable solutions to combat online toxicity.

pdf bib abs
KEC_TECH_TITANS@DravidianLangTech 2025:Sentiment Analysis for Low-Resource Languages: Insights from Tamil and Tulu using Deep Learning and Machine Learning Models
Malliga Subramanian | Kogilavani Shanmugavadivel | Dharshini S | Deepiga P | Praveenkumar C | Ananthakumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment analysis in Dravidian languages like Tamil and Tulu presents significant challenges due to their linguistic diversity and limited resources for natural language processing (NLP). This study explores sentiment classification for Tamil and Tulu, focusing on the complexities of handling both languages, which differ in script, grammar, and vocabulary. We employ a variety of machine learning and deep learning techniques, including traditional models like Support Vector Machines (SVM), and K-Nearest Neighbors (KNN), as well as advanced transformer-based models like BERT and multilingual BERT (mBERT). A key focus of this research is to evaluate the performance of these models on sentiment analysis tasks, considering metrics such as accuracy, precision, recall, and F1-score. The results show that transformer-based models, particularly mBERT, significantly outperform traditional machine learning models in both Tamil and Tulu sentiment classification. This study also highlights the need for further research into addressing challenges like language-specific nuances, dataset imbalance, and data augmentation techniques for improved sentiment analysis in under-resourced languages like Tamil and Tulu.

Co-authors

Venues

Fix data