Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus

Elena Zotova, Rodrigo Agerri, Manuel Nuñez, German Rigau


Abstract
Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the ndependence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish.
Anthology ID:
2020.lrec-1.171
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1368–1375
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.171
DOI:
Bibkey:
Cite (ACL):
Elena Zotova, Rodrigo Agerri, Manuel Nuñez, and German Rigau. 2020. Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1368–1375, Marseille, France. European Language Resources Association.
Cite (Informal):
Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (Zotova et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.lrec-1.171.pdf