Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum

Marco Di Giovanni, Marco Brambilla


Abstract
On September 2020 a constitutional referendum was held in Italy. In this work we collect a dataset of 1.2M tweets related to this event, with particular interest to the textual content shared, and we design a hashtag-based semi-automatic approach to label them as Supporters or Against the referendum. We use the labelled dataset to train a classifier based on transformers, unsupervisedly pre-trained on Italian corpora. Our model generalizes well on tweets that cannot be labeled by the hashtag-based approach. We check that no length-, lexicon- and sentiment-biases are present to affect the performance of the classifier. Finally, we discuss the discrepancy between the magnitudes of tweets expressing a specific stance, obtained using both the hashtag-based approach and our trained classifier, and the real outcome of the referendum: the referendum was approved by 70% of the voters, while the number of tweets against the referendum is four times greater than the number of tweets supporting it. We conclude that the Italian referendum was an example of event where the minority was very loud on social media, highly influencing the perception of the event. Analyzing only the activity on social media is dangerous and can lead to extremely wrong forecasts.
Anthology ID:
2021.socialnlp-1.2
Volume:
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media
Month:
June
Year:
2021
Address:
Online
Editors:
Lun-Wei Ku, Cheng-Te Li
Venue:
SocialNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14–23
Language:
URL:
https://aclanthology.org/2021.socialnlp-1.2
DOI:
10.18653/v1/2021.socialnlp-1.2
Bibkey:
Cite (ACL):
Marco Di Giovanni and Marco Brambilla. 2021. Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum. In Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media, pages 14–23, Online. Association for Computational Linguistics.
Cite (Informal):
Content-based Stance Classification of Tweets about the 2020 Italian Constitutional Referendum (Di Giovanni & Brambilla, SocialNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2021.socialnlp-1.2.pdf
Code
 marco-digio/italian-referendum-2020