FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias

Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri Jacques Geiss, Paolo Rosso, Lucie Flek


Abstract
Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4M Reddit posts, and includes, beyond the users’ binary labels, also their fine-grained credibility level (very low to very high) and their political bias strength (extreme right to extreme left). As far as we are aware, this is the first fake news spreader dataset that simultaneously captures both the long-term context of users’ historical posts and the interactions between them. To create the first benchmark on our data, we provide methods for identifying misinformation spreaders by utilizing the social connections between the users along with their psycho-linguistic features. We show that the users’ social interactions can, on their own, indicate misinformation spreading, while the psycho-linguistic features are mostly informative in non-neural classification settings. In a qualitative analysis we observe that detecting affective mental processes correlates negatively with right-biased users, and that the openness to experience factor is lower for those who spread fake news.
Anthology ID:
2022.lrec-1.345
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3231–3241
Language:
URL:
https://aclanthology.org/2022.lrec-1.345
DOI:
Bibkey:
Cite (ACL):
Flora Sakketou, Joan Plepi, Riccardo Cervero, Henri Jacques Geiss, Paolo Rosso, and Lucie Flek. 2022. FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3231–3241, Marseille, France. European Language Resources Association.
Cite (Informal):
FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias (Sakketou et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.lrec-1.345.pdf
Code
 caisa-lab/factoid-dataset
Data
RealNews