Subhadevi K
2025
Team_Catalysts@DravidianLangTech 2025: Leveraging Political Sentiment Analysis using Machine Learning Techniques for Classifying Tamil Tweets
Kogilavani Shanmugavadivel
|
Malliga Subramanian
|
Subhadevi K
|
Sowbharanika Janani Sivakumar
|
Rahul K
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This work proposed a methodology for assessing political sentiments in Tamil tweets using machine learning models. The approach addressed linguistic challenges in Tamil text, including cleaning, normalization, tokenization, and class imbalance, through a robust preprocessing pipeline. Various models, including Random Forest, Logistic Regression, and CatBoost, were applied, with Random Forest achieving a macro F1-score of 0.2933 and securing 8th rank among 153 participants in the Codalab competition. This accomplishment highlights the effectiveness of machine learning models in handling the complexities of multilingual, code-mixed, and unstructured data in Tamil political discourse. The study also emphasized the importance of tailored preprocessing techniques to improve model accuracy and performance. It demonstrated the potential of computational linguistics and machine learning in understanding political discourse in low-resource languages like Tamil, contributing to advancements in regional sentiment analysis.
2024
MIT-KEC-NLP@DravidianLangTech-EACL 2024: Offensive Content Detection in Kannada and Kannada-English Mixed Text Using Deep Learning Techniques
Kogilavani Shanmugavadivel
|
Sowbarnigaa K S
|
Mehal Sakthi M S
|
Subhadevi K
|
Malliga Subramanian
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This study presents a strong methodology for detecting offensive content in multilingual text, with a focus on Kannada and Kannada-English mixed comments. The first step in data preprocessing is to work with a dataset containing Kannada comments, which is backed by Google Translate for Kannada-English translation. Following tokenization and sequence labeling, BIO tags are assigned to indicate the existence and bounds of objectionable spans within the text. On annotated data, a Bidirectional LSTM neural network model is trained and BiLSTM model’s macro F1 score is 61.0 in recognizing objectionable content. Data preparation, model architecture definition, and iterative training with Kannada and Kannada- English text are all part of the training process. In a fresh dataset, the trained model accurately predicts offensive spans, emphasizing comments in the aforementioned languages. Predictions that have been recorded and include offensive span indices are organized into a database.
Search
Fix data
Co-authors
- Kogilavani Shanmugavadivel 2
- Malliga Subramanian 2
- Rahul K 1
- Sowbarnigaa K S 1
- Mehal Sakthi M S 1
- show all...