Abstract
This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.- Anthology ID:
- 2021.calcs-1.12
- Volume:
- Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Thamar Solorio, Shuguang Chen, Alan W. Black, Mona Diab, Sunayana Sitaram, Victor Soto, Emre Yilmaz, Anirudh Srinivasan
- Venue:
- CALCS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 95–102
- Language:
- URL:
- https://aclanthology.org/2021.calcs-1.12
- DOI:
- 10.18653/v1/2021.calcs-1.12
- Cite (ACL):
- Marvin Agüero-Torales, David Vilares, and Antonio López-Herrera. 2021. On the logistical difficulties and findings of Jopara Sentiment Analysis. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 95–102, Online. Association for Computational Linguistics.
- Cite (Informal):
- On the logistical difficulties and findings of Jopara Sentiment Analysis (Agüero-Torales et al., CALCS 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.calcs-1.12.pdf
- Code
- mmaguero/josa-corpus