Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection

Katarina Laken; Erik Bran Marino; Paloma Piot; Davide Bassi; Søren Kirkegaard Fomsgaard; Michele Joshua Maggini; Renata Vieira; Marcos Garcia; Sara Tonelli

Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection

Katarina Laken, Erik Bran Marino, Paloma Piot, Davide Bassi, Søren Kirkegaard Fomsgaard, Michele Joshua Maggini, Renata Vieira, Marcos Garcia, Sara Tonelli

Abstract

The proliferation of conspiracy theories and hateful messages on social media poses significant challenges for content moderation and public discourse. Despite their societal impact, existing datasets for automated conspiracy detection remain limited in scope and language coverage. We present a multilingual dataset of conspiracy content on Telegram comprising 5750 messages across English, Dutch, Italian, Spanish and Portuguese from 87 channels documented as disseminating conspiracist and extremist content. Domain experts annotated messages for conspiracist tone, population replacement conspiracy theories, vaccine conspiracies, and hate speech. We extensively report on difficulties and caveats when creating and annotating this type of dataset. We establish classification baselines by evaluating six models in zero-shot fashion and fine-tuning three encoder models, achieving F1 scores up to 0.800 for conspiracist tone, 0.846 for PRCT, 0.843 for vaccine-related conspiracy theories, and 0.734 for hate speech. Inter-annotator agreement was moderate, consistent with the complexity documented in similar annotation tasks.

Anthology ID:: 2026.lrec-main.582
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 7345–7358
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.582/
DOI:
Bibkey:
Cite (ACL):: Katarina Laken, Erik Bran Marino, Paloma Piot, Davide Bassi, Søren Kirkegaard Fomsgaard, Michele Joshua Maggini, Renata Vieira, Marcos Garcia, and Sara Tonelli. 2026. Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection. International Conference on Language Resources and Evaluation, main:7345–7358.
Cite (Informal):: Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection (Laken et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.582.pdf

PDF Cite Search Fix data