Ricardo Campos


2025

pdf bib
SemEval 2025 Task 10: Multilingual Characterization and Extraction of Narratives from Online News
Jakub Piskorski | Tarek Mahmoud | Nikolaos Nikolaidis | Ricardo Campos | Alipio Mario Jorge | Dimitar Dimitrov | Purificação Silvano | Roman Yangarber | Shivam Sharma | Tanmoy Chakraborty | Nuno Guimaraes | Elisa Sartori | Nicolas Stefanovitch | Zhuohan Xie | Preslav Nakov | Giovanni Da San Martino
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

We introduce SemEval-2025 Task 10 on Multilingual Characterization and Extraction of Narratives from Online News, which focuses on the identification and analysis of narratives in online news media. The task is structured into three subtasks: (1) Entity Framing, to identify the roles that relevant entities play within narratives, (2) Narrative Classification, to assign documents fine-grained narratives according to a given, topic-specific taxonomy of narrative labels, and (3) Narrative Extraction, to provide a justification for the dominant narrative of the document. To this end, we analyze news articles across two critical domains, Ukraine-Russia War and Climate Change, in five languages: Bulgarian, English, Hindi, Portuguese, and Russian. This task introduces a novel multilingual and multifaceted framework for studying how online news media construct and disseminate manipulative narratives. By addressing these challenges, our work contributes to the broader effort of detecting, understanding, and mitigating the spread of propaganda and disinformation. The task attracted a lot of interest: 310 teams registered, with 66 submitting official results on the test set.

2024

pdf bib
text2story: A Python Toolkit to Extract and Visualize Story Components of Narrative Text
Evelin Amorim | Ricardo Campos | Alipio Jorge | Pedro Mota | Rúben Almeida
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Story components, namely, events, time, participants, and their relations are present in narrative texts from different domains such as journalism, medicine, finance, and law. The automatic extraction of narrative elements encompasses several NLP tasks such as Named Entity Recognition, Semantic Role Labeling, Event Extraction, Coreference resolution, and Temporal Inference. The text2story python, an easy-to-use modular library, supports the narrative extraction and visualization pipeline. The package contains an array of narrative extraction tools that can be used separately or in sequence. With this toolkit, end users can process free text in English or Portuguese and obtain formal representations, like standard annotation files or a formal logical representation. The toolkit also enables narrative visualization as Message Sequence Charts (MSC), Knowledge Graphs, and Bubble Diagrams, making it useful to visualize and transform human-annotated narratives. The package combines the use of off-the-shelf and custom tools and is easily patched (replacing existing components) and extended (e.g. with new visualizations). It includes an experimental module for narrative element effectiveness assessment and being is therefore also a valuable asset for researchers developing solutions for narrative extraction. To evaluate the baseline components, we present some results of the main annotators embedded in our packages for datasets in English and Portuguese. We also compare the results with the extraction of narrative elements by GPT-3, a robust LLM model.

pdf bib
Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles
Sérgio Nunes | Alípio Mario Jorge | Evelin Amorim | Hugo Sousa | António Leal | Purificação Moura Silvano | Inês Cantante | Ricardo Campos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.

pdf bib
Indexing Portuguese NLP Resources with PT-Pump-Up
Rúben Almeida | Ricardo Campos | Alípio Jorge | Sérgio Nunes
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2

pdf bib
Perfil Público: Automatic Generation and Visualization of Author Profiles for Digital News Media
Nuno Guimarães | Ricardo Campos | Alípio Jorge
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2

2022

pdf bib
The place of ISO-Space in Text2Story multilayer annotation scheme
António Leal | Purificação Silvano | Evelin Amorim | Inês Cantante | Fátima Silva | Alípio Mario Jorge | Ricardo Campos
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

Reasoning about spatial information is fundamental in natural language to fully understand relationships between entities and/or between events. However, the complexity underlying such reasoning makes it hard to represent formally spatial information. Despite the growing interest on this topic, and the development of some frameworks, many problems persist regarding, for instance, the coverage of a wide variety of linguistic constructions and of languages. In this paper, we present a proposal of integrating ISO-Space into a ISO-based multilayer annotation scheme, designed to annotate news in European Portuguese. This scheme already enables annotation at three levels, temporal, referential and thematic, by combining postulates from ISO 24617-1, 4 and 9. Since the corpus comprises news articles, and spatial information is relevant within this kind of texts, a more detailed account of space was required. The main objective of this paper is to discuss the process of integrating ISO-Space with the existing layers of our annotation scheme, assessing the compatibility of the aforementioned parts of ISO 24617, and the problems posed by the harmonization of the four layers and by some specifications of ISO-Space.