Dimitar Iliyanov Dimitrov


2025

pdf bib
PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
Nikolaos Nikolaidis | Nicolas Stefanovitch | Purificação Silvano | Dimitar Iliyanov Dimitrov | Roman Yangarber | Nuno Guimarães | Elisa Sartori | Ion Androutsopoulos | Preslav Nakov | Giovanni Da San Martino | Jakub Piskorski
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present polyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.

pdf bib
Entity Framing and Role Portrayal in the News
Tarek Mahmoud | Zhuohan Xie | Dimitar Iliyanov Dimitrov | Nikolaos Nikolaidis | Purificação Silvano | Roman Yangarber | Shivam Sharma | Elisa Sartori | Nicolas Stefanovitch | Giovanni Da San Martino | Jakub Piskorski | Preslav Nakov
Findings of the Association for Computational Linguistics: ACL 2025

We introduce a novel multilingual and hierarchical corpus annotated for entity framing and role portrayal in news articles. The dataset uses a unique taxonomy inspired by storytelling elements, comprising 22 fine-grained roles, or archetypes, nested within three main categories: protagonist, antagonist, and innocent. Each archetype is carefully defined, capturing nuanced portrayals of entities such as guardian, martyr, and underdog for protagonists; tyrant, deceiver, and bigot for antagonists; and victim, scapegoat, and exploited for innocents. The dataset includes 1,378 recent news articles in five languages (Bulgarian, English, Hindi, European Portuguese, and Russian) focusing on two critical domains of global significance: the Ukraine-Russia War and Climate Change. Over 5,800 entity mentions have been annotated with role labels. This dataset serves as a valuable resource for research into role portrayal and has broader implications for news analysis. We describe the characteristics of the dataset and the annotation process, and we report evaluation results on fine-tuned state-of-the-art multilingual transformers and hierarchical zero-shot learning using LLMs at the level of a document, a paragraph, and a sentence.

pdf bib
Annotating the Annotators: Analysis, Insights and Modelling from an Annotation Campaign on Persuasion Techniques Detection
Davide Bassi | Dimitar Iliyanov Dimitrov | Bernardo D’Auria | Firoj Alam | Maram Hasanain | Christian Moro | Luisa Orrù | Gian Piero Turchi | Preslav Nakov | Giovanni Da San Martino
Findings of the Association for Computational Linguistics: ACL 2025

Persuasion (or propaganda) techniques detection is a relatively novel task in Natural Language Processing (NLP). While there have already been a number of annotation campaigns, they have been based on heuristic guidelines, which have never been thoroughly discussed. Here, we present the first systematic analysis of a complex annotation task -detecting 22 persuasion techniques in memes-, for which we provided continuous expert oversight. The presence of an expert allowed us to critically analyze specific aspects of the annotation process. Among our findings, we show that inter-annotator agreement alone inadequately assessed annotation correctness. We thus define and track different error types, revealing that expert feedback shows varying effectiveness across error categories. This pattern suggests that distinct mechanisms underlie different kinds of misannotations. Based on our findings, we advocate for an expert oversight in annotation tasks and periodic quality audits. As an attempt to reduce the costs for this, we introduce a probabilistic model for optimizing intervention scheduling.