On the logistical difficulties and findings of Jopara Sentiment Analysis

Marvin Agüero-Torales; David Vilares; Antonio López-Herrera

doi:10.18653/v1/2021.calcs-1.12

On the logistical difficulties and findings of Jopara Sentiment Analysis

Marvin Agüero-Torales, David Vilares, Antonio López-Herrera

Abstract

This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, including pre-trained language models, and explore whether they perform better than traditional machine learning ones in this low-resource setup. Transformer architectures obtain the best results, despite not considering Guarani during pre-training, but traditional machine learning models perform close due to the low-resource nature of the problem.

Anthology ID:: 2021.calcs-1.12
Volume:: Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:: June
Year:: 2021
Address:: Online
Venues:: CALCS | NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–102
Language:
URL:: https://aclanthology.org/2021.calcs-1.12
DOI:: 10.18653/v1/2021.calcs-1.12
Bibkey:
Cite (ACL):: Marvin Agüero-Torales, David Vilares, and Antonio López-Herrera. 2021. On the logistical difficulties and findings of Jopara Sentiment Analysis. In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, pages 95–102, Online. Association for Computational Linguistics.
Cite (Informal):: On the logistical difficulties and findings of Jopara Sentiment Analysis (Agüero-Torales et al., CALCS 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2021.calcs-1.12.pdf
Code: mmaguero/josa-corpus

PDF Cite Search Code