MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection

Cagri Toraman, Oguzhan Ozcelik, Furkan Sahinuc, Fazli Can


Abstract
The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.
Anthology ID:
2024.lrec-main.986
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
11283–11295
Language:
URL:
https://aclanthology.org/2024.lrec-main.986
DOI:
Bibkey:
Cite (ACL):
Cagri Toraman, Oguzhan Ozcelik, Furkan Sahinuc, and Fazli Can. 2024. MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11283–11295, Torino, Italia. ELRA and ICCL.
Cite (Informal):
MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection (Toraman et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.986.pdf