Trailblazer@DravidianLangTech 2026: A Comparative Study of TF-IDF SVM and XLM-RoBERTa for Political Multiclass Text Classification.

Anuradha C, Anbuaruvi R, Shanthi Murugan


Abstract
The rapid growth of social media networks faces challenges in the classification of multilingual and code-mixed data. A task is shared by Political Multiclass Sentiment Analysis of Tamil X (Twitter) -DravidianLangTech@ACL 2026 to classify the political text.For the above task, we proposed solutions to compare a traditional machine learning and the transformer based model. First we developed a Baseline traditional Support vector Machine model using the TF-IDF features. To provide a stronger Indic-language baseline we consider the IndicBERT, a transformer model specifically designed for Indian Languages. IndicBERT improves contextual understanding of Tamil-English code-mixed political text . To capture the deeper information from the text we developed an XLM-RoBERTa model where we used minimal pre-processing technique. The Result shows us that the transformer-based performs well compared to the traditional baseline model with the macro F1 score of 0.3738. The Study highlights the importance of robust multi-class social media political text classification.
Anthology ID:
2026.dravidianlangtech-1.66
Volume:
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:
July
Year:
2026
Address:
Underline (Virtual)
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:
DravidianLangTech | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
414–419
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.66/
DOI:
Bibkey:
Cite (ACL):
Anuradha C, Anbuaruvi R, and Shanthi Murugan. 2026. Trailblazer@DravidianLangTech 2026: A Comparative Study of TF-IDF SVM and XLM-RoBERTa for Political Multiclass Text Classification.. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 414–419, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):
Trailblazer@DravidianLangTech 2026: A Comparative Study of TF-IDF SVM and XLM-RoBERTa for Political Multiclass Text Classification. (C et al., DravidianLangTech 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.66.pdf