Nuno Guimarães

Also published as: Nuno Guimaraes


2025

pdf bib
PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles
Nikolaos Nikolaidis | Nicolas Stefanovitch | Purificação Silvano | Dimitar Iliyanov Dimitrov | Roman Yangarber | Nuno Guimarães | Elisa Sartori | Ion Androutsopoulos | Preslav Nakov | Giovanni Da San Martino | Jakub Piskorski
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present polyNarrative, a new multilingual dataset of news articles, annotated for narratives. Narratives are overt or implicit claims, recurring across articles and languages, promoting a specific interpretation or viewpoint on an ongoing topic, often propagating mis/disinformation. We developed two-level taxonomies with coarse- and fine-grained narrative labels for two domains: (i) climate change and (ii) the military conflict between Ukraine and Russia. We collected news articles in four languages (Bulgarian, English, Portuguese, and Russian) related to the two domains and manually annotated them at the paragraph level. We make the dataset publicly available, along with experimental results of several strong baselines that assign narrative labels to news articles at the paragraph or the document level. We believe that this dataset will foster research in narrative detection and enable new research directions towards more multi-domain and highly granular narrative related tasks.

pdf bib
Enhancing an Annotation Scheme for Clinical Narratives in Portuguese through Human Variation Analysis
Ana Luisa Fernandes | Purificação Silvano | António Leal | Nuno Guimarães | Rita Rb-Silva | Luís Filipe Cunha | Alípio Jorge
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

The development of a robust annotation scheme and corresponding guidelines is crucial for producing annotated datasets that advance both linguistic and computational research. This paper presents a case study that outlines a methodology for designing an annotation scheme and its guidelines, specifically aimed at representing morphosyntactic and semantic information regarding temporal features, as well as medical information in medical reports written in Portuguese. We detail a multi-step process that includes reviewing existing frameworks, conducting an annotation experiment to determine the optimal approach, and designing a model based on these findings. We validated the approach through a pilot experiment where we assessed the reliability and applicability of the annotation scheme and guidelines. In this experiment, two annotators independently annotated a patient’s medical report consisting of six documents using the proposed model, while a curator established the ground truth. The analysis of inter-annotator agreement and the annotation results enabled the identification of sources of human variation and provided insights for further refinement of the annotation scheme and guidelines.

pdf bib
SemEval 2025 Task 10: Multilingual Characterization and Extraction of Narratives from Online News
Jakub Piskorski | Tarek Mahmoud | Nikolaos Nikolaidis | Ricardo Campos | Alipio Mario Jorge | Dimitar Dimitrov | Purificação Silvano | Roman Yangarber | Shivam Sharma | Tanmoy Chakraborty | Nuno Guimaraes | Elisa Sartori | Nicolas Stefanovitch | Zhuohan Xie | Preslav Nakov | Giovanni Da San Martino
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We introduce SemEval-2025 Task 10 on Multilingual Characterization and Extraction of Narratives from Online News, which focuses on the identification and analysis of narratives in online news media. The task is structured into three subtasks: (1) Entity Framing, to identify the roles that relevant entities play within narratives, (2) Narrative Classification, to assign documents fine-grained narratives according to a given, topic-specific taxonomy of narrative labels, and (3) Narrative Extraction, to provide a justification for the dominant narrative of the document. To this end, we analyze news articles across two critical domains, Ukraine-Russia War and Climate Change, in five languages: Bulgarian, English, Hindi, Portuguese, and Russian. This task introduces a novel multilingual and multifaceted framework for studying how online news media construct and disseminate manipulative narratives. By addressing these challenges, our work contributes to the broader effort of detecting, understanding, and mitigating the spread of propaganda and disinformation. The task attracted a lot of interest: 310 teams registered, with 66 submitting official results on the test set.

2024

pdf bib
Perfil Público: Automatic Generation and Visualization of Author Profiles for Digital News Media
Nuno Guimarães | Ricardo Campos | Alípio Jorge
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2