Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data
Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black
Abstract
Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question - ‘Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?’. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.- Anthology ID:
- 2021.calcs-1.13
- Volume:
- Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
- Venue:
- CALCS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 103–112
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2021.calcs-1.13/
- DOI:
- 10.18653/v1/2021.calcs-1.13
- Cite (ACL):
- Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, and Alan W Black. 2021. Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 103–112, Online. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data (Gupta et al., CALCS 2021)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2021.calcs-1.13.pdf
- Data
- TweetEval