2025
pdf
bib
abs
PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
Nikolaos Nikolaidis
|
Nicolas Stefanovitch
|
Purificação Silvano
|
Dimitar Iliyanov Dimitrov
|
Roman Yangarber
|
Nuno Guimarães
|
Elisa Sartori
|
Ion Androutsopoulos
|
Preslav Nakov
|
Giovanni Da San Martino
|
Jakub Piskorski
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present polyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.
pdf
bib
abs
Entity Framing and Role Portrayal in the News
Tarek Mahmoud
|
Zhuohan Xie
|
Dimitar Iliyanov Dimitrov
|
Nikolaos Nikolaidis
|
Purificação Silvano
|
Roman Yangarber
|
Shivam Sharma
|
Elisa Sartori
|
Nicolas Stefanovitch
|
Giovanni Da San Martino
|
Jakub Piskorski
|
Preslav Nakov
Findings of the Association for Computational Linguistics: ACL 2025
We introduce a novel multilingual and hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.
2024
pdf
bib
abs
Exploring the Usability of Persuasion Techniques for Downstream Misinformation-related Classification Tasks
Nikolaos Nikolaidis
|
Jakub Piskorski
|
Nicolas Stefanovitch
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We systematically explore the predictive power of features derived from Persuasion Techniques detected in texts, for solving different tasks of interest for media analysis; notably: detecting mis/disinformation, fake news, propaganda, partisan news and conspiracy theories. Firstly, we propose a set of meaningful features, aiming to capture the persuasiveness of a text. Secondly, we assess the discriminatory power of these features in different text classification tasks on 8 selected datasets from the literature using two metrics. We also evaluate the per-task discriminatory power of each Persuasion Technique and report on different insights. We find out that most of these features have a noticeable potential to distinguish conspiracy theories, hyperpartisan news and propaganda, while we observed mixed results in the context of fake news detection.
2023
pdf
bib
abs
Multilingual Multifaceted Understanding of Online News in Terms of Genre, Framing, and Persuasion Techniques
Jakub Piskorski
|
Nicolas Stefanovitch
|
Nikolaos Nikolaidis
|
Giovanni Da San Martino
|
Preslav Nakov
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present a new multilingual multifacet dataset of news articles, each annotated for genre (objective news reporting vs. opinion vs. satire), framing (what key aspects are highlighted), and persuasion techniques (logical fallacies, emotional appeals, ad hominem attacks, etc.). The persuasion techniques are annotated at the span level, using a taxonomy of 23 fine-grained techniques grouped into 6 coarse categories. The dataset contains 1,612 news articles covering recent news on current topics of public interest in six European languages (English, French, German, Italian, Polish, and Russian), with more than 37k annotated spans of persuasion techniques. We describe the dataset and the annotation process, and we report the evaluation results of multilabel classification experiments using state-of-the-art multilingual transformers at different levels of granularity: token-level, sentence-level, paragraph-level, and document-level.
pdf
bib
abs
On Experiments of Detecting Persuasion Techniques in Polish and Russian Online News: Preliminary Study
Nikolaos Nikolaidis
|
Nicolas Stefanovitch
|
Jakub Piskorski
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)
This paper reports on the results of preliminary experiments on the detection of persuasion techniques in online news in Polish and Russian, using a taxonomy of 23 persuasion techniques. The evaluation addresses different aspects, namely, the granularity of the persuasion technique category, i.e., coarse- (6 labels) versus fine-grained (23 labels), and the focus of the classification, i.e., at which level the labels are detected (subword, sentence, or paragraph). We compare the performance of mono- verus multi-lingual-trained state-of-the-art transformed-based models in this context.