Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions
Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, Alexander Gelbukh
Abstract
This study examines sentiment analysis in Tamil-English code-mixed texts using advanced transformer-based architectures. The unique linguistic challenges, including mixed grammar, orthographic variability, and phonetic inconsistencies, are addressed. Data limitations and annotation gaps are discussed, highlighting the need for larger datasets. The performance of models such as XLM-RoBERTa, mT5, IndicBERT, and RemBERT is evaluated, with insights into their optimization for low-resource, code-mixed environments.- Anthology ID:
- 2025.nlp4dh-1.27
- Volume:
- Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
- Month:
- May
- Year:
- 2025
- Address:
- Albuquerque, USA
- Editors:
- Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
- Venues:
- NLP4DH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 305–312
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.27/
- DOI:
- Cite (ACL):
- Mikhail Krasitskii, Olga Kolesnikova, Liliana Chanona Hernandez, Grigori Sidorov, and Alexander Gelbukh. 2025. Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 305–312, Albuquerque, USA. Association for Computational Linguistics.
- Cite (Informal):
- Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions (Krasitskii et al., NLP4DH 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.nlp4dh-1.27.pdf