Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection
Katarina Laken, Erik Bran Marino, Paloma Piot, Davide Bassi, Søren Kirkegaard Fomsgaard, Michele Joshua Maggini, Renata Vieira, Marcos Garcia, Sara Tonelli
Abstract
The proliferation of conspiracy theories and hateful messages on social media poses significant challenges for content moderation and public discourse. Despite their societal impact, existing datasets for automated conspiracy detection remain limited in scope and language coverage. We present a multilingual dataset of conspiracy content on Telegram comprising 5750 messages across English, Dutch, Italian, Spanish and Portuguese from 87 channels documented as disseminating conspiracist and extremist content. Domain experts annotated messages for conspiracist tone, population replacement conspiracy theories, vaccine conspiracies, and hate speech. We extensively report on difficulties and caveats when creating and annotating this type of dataset. We establish classification baselines by evaluating six models in zero-shot fashion and fine-tuning three encoder models, achieving F1 scores up to 0.800 for conspiracist tone, 0.846 for PRCT, 0.843 for vaccine-related conspiracy theories, and 0.734 for hate speech. Inter-annotator agreement was moderate, consistent with the complexity documented in similar annotation tasks.- Anthology ID:
- 2026.lrec-main.582
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 7345–7358
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.582/
- DOI:
- Cite (ACL):
- Katarina Laken, Erik Bran Marino, Paloma Piot, Davide Bassi, Søren Kirkegaard Fomsgaard, Michele Joshua Maggini, Renata Vieira, Marcos Garcia, and Sara Tonelli. 2026. Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection. International Conference on Language Resources and Evaluation, main:7345–7358.
- Cite (Informal):
- Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection (Laken et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.582.pdf