2025
pdf
bib
abs
PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
Nikolaos Nikolaidis
|
Nicolas Stefanovitch
|
Purificação Silvano
|
Dimitar Iliyanov Dimitrov
|
Roman Yangarber
|
Nuno Guimarães
|
Elisa Sartori
|
Ion Androutsopoulos
|
Preslav Nakov
|
Giovanni Da San Martino
|
Jakub Piskorski
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present polyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.
pdf
bib
abs
Entity Framing and Role Portrayal in the News
Tarek Mahmoud
|
Zhuohan Xie
|
Dimitar Iliyanov Dimitrov
|
Nikolaos Nikolaidis
|
Purificação Silvano
|
Roman Yangarber
|
Shivam Sharma
|
Elisa Sartori
|
Nicolas Stefanovitch
|
Giovanni Da San Martino
|
Jakub Piskorski
|
Preslav Nakov
Findings of the Association for Computational Linguistics: ACL 2025
We introduce a novel multilingual and hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.
pdf
bib
abs
NarratEX Dataset: Explaining the Dominant Narratives in News Texts
Nuno Guimarães
|
Purificação Silvano
|
Ricardo Campos
|
Alipio Jorge
|
Ana Filipa Pacheco
|
Dimitar Iliyanov Dimitrov
|
Nikolaos Nikolaidis
|
Roman Yangarber
|
Elisa Sartori
|
Nicolas Stefanovitch
|
Preslav Nakov
|
Jakub Piskorski
|
Giovanni Da San Martino
Findings of the Association for Computational Linguistics: EMNLP 2025
We present NarratEX, a dataset designed for the task of explaining the choice of the Dominant Narrative in a news article, and intended to support the research community in addressing challenges such as discourse polarization and propaganda detection. Our dataset comprises 1,056 news articles in four languages, Bulgarian, English, Portuguese, and Russian, covering two globally significant topics: the Ukraine-Russia War (URW) and Climate Change (CC). Each article is manually annotated with a dominant narrative and sub-narrative labels, and an explanation justifying the chosen labels. We describe the dataset, the process of its creation, and its characteristics. We present experiments with two new proposed tasks: Explaining Dominant Narrative based on Text, which involves writing a concise paragraph to justify the choice of the dominant narrative and sub-narrative of a given text, and Inferring Dominant Narrative from Explanation, which involves predicting the appropriate dominant narrative category based on an explanatory text. The proposed dataset is a valuable resource for advancing research on detecting and mitigating manipulative content, while promoting a deeper understanding of how narratives influence public discourse.
pdf
bib
abs
PropXplain: Can LLMs Enable Explainable Propaganda Detection?
Maram Hasanain
|
Md Arid Hasan
|
Mohamed Bayan Kmainasi
|
Elisa Sartori
|
Ali Ezzat Shahroor
|
Giovanni Da San Martino
|
Firoj Alam
Findings of the Association for Computational Linguistics: EMNLP 2025
There has been significant research on propagandistic content detection across different modalities and languages. However, most studies have primarily focused on detection, with little attention given to explanations justifying the predicted label. This is largely due to the lack of resources that provide explanations alongside annotated labels. To address this issue, we propose a multilingual (i.e., Arabic and English) explanation-enhanced dataset, the first of its kind. Additionally, we introduce an explanation-enhanced LLM for both label detection and rationale-based explanation generation. Our findings indicate that the model performs comparably while also generating explanations. We will make the dataset and experimental resources publicly available for the research community (https://github.com/firojalam/PropXplain).
pdf
bib
abs
Insights into using temporal coordinated behaviour to explore connections between social media posts and influence
Elisa Sartori
|
Serena Tardelli
|
Maurizio Tesconi
|
Mauro Conti
|
Alessandro Galeazzi
|
Stefano Cresci
|
Giovanni Da San Martino
Findings of the Association for Computational Linguistics: EMNLP 2025
Political campaigns increasingly rely on targeted strategies to influence voters on social media. Often, such campaigns have been studied by analysing coordinated behaviour to identify communities of users who exhibit similar patterns. While these analyses are typically conducted on static networks, recent extensions to temporal networks allow tracking users who change communities over time, opening new opportunities to quantitatively study influence in social networks. As a first step toward this goal, we analyse the messages users were exposed to during the UK 2019 election, comparing those received by users who shifted communities with others covering the same topics.Our findings reveal 54 statistically significant linguistic differences and show that a subset of persuasion techniques, including loaded language, exaggeration and minimization, doubt, and flag-waving, are particularly relevant to users’ shifts. This work underscores the importance of analysing coordination from a temporal and dynamic perspective to infer the drivers of users’ shifts in online debate.
pdf
bib
abs
SemEval 2025 Task 10: Multilingual Characterization and Extraction of Narratives from Online News
Jakub Piskorski
|
Tarek Mahmoud
|
Nikolaos Nikolaidis
|
Ricardo Campos
|
Alipio Mario Jorge
|
Dimitar Dimitrov
|
Purificação Silvano
|
Roman Yangarber
|
Shivam Sharma
|
Tanmoy Chakraborty
|
Nuno Guimaraes
|
Elisa Sartori
|
Nicolas Stefanovitch
|
Zhuohan Xie
|
Preslav Nakov
|
Giovanni Da San Martino
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
We introduce SemEval-2025 Task 10 on Multilingual Characterization and Extraction of Narratives from Online News, which focuses on the identification and analysis of narratives in online news media. The task is structured into three subtasks: (1) Entity Framing, to identify the roles that relevant entities play within narratives, (2) Narrative Classification, to assign documents fine-grained narratives according to a given, topic-specific taxonomy of narrative labels, and (3) Narrative Extraction, to provide a justification for the dominant narrative of the document. To this end, we analyze news articles across two critical domains, Ukraine-Russia War and Climate Change, in five languages: Bulgarian, English, Hindi, Portuguese, and Russian. This task introduces a novel multilingual and multifaceted framework for studying how online news media construct and disseminate manipulative narratives. By addressing these challenges, our work contributes to the broader effort of detecting, understanding, and mitigating the spread of propaganda and disinformation. The task attracted a lot of interest: 310 teams registered, with 66 submitting official results on the test set.
2023
pdf
bib
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Atul Kr. Ojha
|
A. Seza Doğruöz
|
Giovanni Da San Martino
|
Harish Tayyar Madabushi
|
Ritesh Kumar
|
Elisa Sartori
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)