Shanthi S

2026

ByteBuilders@DravidianLangTech 2026: Transformer-Based Weighted Ensemble for Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments
Mitharshana T V | Shanthi S | Lavana V | Kaviya Varma R
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Our proposal for the Dravidian LangTech 2026 Tamil Political Sentiment Analysis job is outlined in this document. Seven categories—substantiated, sarcastic, opinionated, positive, negative, neutral, and none of the above—should be used to classify Tamil political remarks according to their attitudes. Classifying the sentiments of Tamil political utterances is quite difficult. Furthermore, the emotions associated with various identities are not distributed uniformly. We built an ensemble of two transformer-based techniques, XLM-RoBERTa and IndicBERT, and used 10-fold cross-validation to improve the model’s dependability and prevent overfitting in order to address some of these issues while finishing this research. In order to help the model concentrate more on the challenging examples, used oversampling to address class imbalance and Focal Loss to train the model. In order to improve the representation of sentences, finally averaged the token embeddings.

Co-authors

Venues

Fix author