Abstract
The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.- Anthology ID:
- 2024.lrec-main.986
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 11283–11295
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.986
- DOI:
- Cite (ACL):
- Cagri Toraman, Oguzhan Ozcelik, Furkan Sahinuc, and Fazli Can. 2024. MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11283–11295, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection (Toraman et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.986.pdf