Davide Bassi
2026
IHPP: A Paragraph-Level Dataset for Investigating the Pragmatics of Hyperpartisan Italian News
Michele Joshua Maggini | Davide Bassi | Angelo Valente | Gaël Dias | Pablo Gamallo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Michele Joshua Maggini | Davide Bassi | Angelo Valente | Gaël Dias | Pablo Gamallo
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This study investigates the linguistic composition of hyperpartisan paragraphs in Italian news on climate change, Ukraine war, and immigration by publicly disclosing the dataset to ensure reproducibility. We introduce a new corpus, IHPP, of 356 articles, for a total of 4,861 paragraphs annotated for hyperpartisan news detection at the paragraph level and enriched with span-level annotations of six semantic-pragmatic linguistic traits: figurative speech, irony/sarcasm, epithet, as well as hyperbolic and loaded language. We hypothesized that these traits, while violating Gricean maxims, are key mechanisms of hyperpartisan rhetoric. To test this, we fine-tuned a set of mono- and multilingual BERT models for hyperpartisan detection and evaluated their incorporation in the embedding space. Then, we applied explainable techniques, e.g. Integrated Gradients and SHAP to analyze how models allocate attribution to normal and linguistic-trait tokens. Our result show that loaded language is the most discriminative trait. The dataset is released: https://github.com/MichJoM/IHPP-Climate.
Mute Cods: A Multilingual Telegram Dataset with Benchmark Models for Conspiracy Theory Detection
Katarina Laken | Erik Bran Marino | Paloma Piot | Davide Bassi | Søren Kirkegaard Fomsgaard | Michele Joshua Maggini | Renata Vieira | Marcos Garcia | Sara Tonelli
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Katarina Laken | Erik Bran Marino | Paloma Piot | Davide Bassi | Søren Kirkegaard Fomsgaard | Michele Joshua Maggini | Renata Vieira | Marcos Garcia | Sara Tonelli
Proceedings of the Fifteenth Language Resources and Evaluation Conference
The proliferation of conspiracy theories and hateful messages on social media poses significant challenges for content moderation and public discourse. Despite their societal impact, existing datasets for automated conspiracy detection remain limited in scope and language coverage. We present a multilingual dataset of conspiracy content on Telegram comprising 5750 messages across English, Dutch, Italian, Spanish and Portuguese from 87 channels documented as disseminating conspiracist and extremist content. Domain experts annotated messages for conspiracist tone, population replacement conspiracy theories, vaccine conspiracies, and hate speech. We extensively report on difficulties and caveats when creating and annotating this type of dataset. We establish classification baselines by evaluating six models in zero-shot fashion and fine-tuning three encoder models, achieving F1 scores up to 0.800 for conspiracist tone, 0.846 for PRCT, 0.843 for vaccine-related conspiracy theories, and 0.734 for hate speech. Inter-annotator agreement was moderate, consistent with the complexity documented in similar annotation tasks.
2025
Detecting Hyperpartisanship and Rhetorical Bias in Climate Journalism: A Sentence-Level Italian Dataset
Michele Joshua Maggini | Davide Bassi | Pablo Gamallo
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
Michele Joshua Maggini | Davide Bassi | Pablo Gamallo
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
We present the first Italian dataset for joint hyperpartisan and rhetorical bias detection in climate change discourse. The dataset comprises 48 articles (1,010 sentences) from far-right media outlets, annotated at sentence level for both binary hyperpartisan classification and a fine-grained taxonomy of 17 rhetorical biases. Our annotation scheme achieves a Cohen’s kappa agreement of 0.63 on the gold test set (173 sentences), demonstrating the complexity and reliability of the task. We conduct extensive analysis revealing significant correlations between hyperpartisan content and specific rhetorical techniques, particularly in climate change, Euroscepticism, and green policy coverage. To the best of our knowledge, we are the first to tackle hyperpartisan detection related to logical fallacies. Indeed, we studied their correlation. Moreover, up to our knowledge no previous work focused on hyperpartisan at sentence level. Our experiments with state-of-the-art language models (GPT-4o-mini) and Italian BERTbase models establish strong baselines for both tasks, while highlighting the challenges in detecting subtle manipulation strategies applied with rhetorical biases. To ensure reproducibility while addressing copyright concerns, we release article URLs, article id and paragraph’s number alongside comprehensive annotation guidelines. This resource advances research in cross-lingual propaganda detection and provides insights into the rhetorical strategies employed in Italian climate change discourse. We provide the code and the dataset to reproduce our results: https://anonymous.4open.science/r/Climate_HP-RB-D5EF/README.md
Annotating the Annotators: Analysis, Insights and Modelling from an Annotation Campaign on Persuasion Techniques Detection
Davide Bassi | Dimitar Iliyanov Dimitrov | Bernardo D’Auria | Firoj Alam | Maram Hasanain | Christian Moro | Luisa Orrù | Gian Piero Turchi | Preslav Nakov | Giovanni Da San Martino
Findings of the Association for Computational Linguistics: ACL 2025
Davide Bassi | Dimitar Iliyanov Dimitrov | Bernardo D’Auria | Firoj Alam | Maram Hasanain | Christian Moro | Luisa Orrù | Gian Piero Turchi | Preslav Nakov | Giovanni Da San Martino
Findings of the Association for Computational Linguistics: ACL 2025
Persuasion (or propaganda) techniques detection is a relatively novel task in Natural Language Processing (NLP). While there have already been a number of annotation campaigns, they have been based on heuristic guidelines, which have never been thoroughly discussed. Here, we present the first systematic analysis of a complex annotation task -detecting 22 persuasion techniques in memes-, for which we provided continuous expert oversight. The presence of an expert allowed us to critically analyze specific aspects of the annotation process. Among our findings, we show that inter-annotator agreement alone inadequately assessed annotation correctness. We thus define and track different error types, revealing that expert feedback shows varying effectiveness across error categories. This pattern suggests that distinct mechanisms underlie different kinds of misannotations. Based on our findings, we advocate for an expert oversight in annotation tasks and periodic quality audits. As an attempt to reduce the costs for this, we introduce a probabilistic model for optimizing intervention scheduling.
Linguistic Markers of Population Replacement Conspiracy Theories in YouTube Immigration Discourse
Erik Bran Marino | Davide Bassi | Renata Vieira
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Erik Bran Marino | Davide Bassi | Renata Vieira
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Old but Gold: LLM-Based Features and Shallow Learning Methods for Fine-Grained Controversy Analysis in YouTube Comments
Davide Bassi | Erik Bran Marino | Renata Vieira | Martin Pereira
Proceedings of the 12th Argument mining Workshop
Davide Bassi | Erik Bran Marino | Renata Vieira | Martin Pereira
Proceedings of the 12th Argument mining Workshop
Online discussions can either bridge differences through constructive dialogue or amplify divisions through destructive interactions. paper proposes a computational approach to analyze dialogical relation patterns in YouTube comments, offering a fine-grained framework for controversy detection, enabling also analysis of individual contributions. experiments demonstrate that shallow learning methods, when equipped with these theoretically-grounded features, consistently outperform more complex language models in characterizing discourse quality at both comment-pair and conversation-chain levels.studies confirm that divisive rhetorical techniques serve as strong predictors of destructive communication patterns. work advances understanding of how communicative choices shape online discourse, moving beyond engagement metrics toward nuanced examination of constructive versus destructive dialogue patterns.