Gladiss Merlin N.r

2025

pdf bib abs
RMKMavericks@DravidianLangTech 2025: Emotion Mining in Tamil and Tulu Code-Mixed Text: Challenges and Insights
Gladiss Merlin N.r | Boomika E | Lahari P
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment analysis in code-mixed social media comments written in Tamil and Tulu presents unique challenges due to grammatical inconsistencies, code-switching, and the use of non-native scripts. To address these complexities, we employ pre-processing techniques for text cleaning and evaluate machine learning models tailored for sentiment detection. Traditional machine learning methods combined with feature extraction strategies, such as TF- IDF, are utilized. While logistic regression demonstrated reasonable performance on the Tamil dataset, achieving a macro F1 score of 0.44, support vector machines (SVM) outperformed logistic regression on the Tulu dataset with a macro F1 score of 0.54. These results demonstrate the effectiveness of traditional approaches, particularly SVM, in handling low- resource, multilingual data, while also high- lighting the need for further refinement to improve performance across underrepresented sentiment classes.

Co-authors

Boomika E 1
Lahari P 1

Venues

Fix data