Gonçalo Correia

2025

pdf bib abs
Effective Multi-Task Learning for Biomedical Named Entity Recognition
João Ruano | Gonçalo Correia | Leonor Barreiros | Afonso Mendes
Proceedings of the 24th Workshop on Biomedical Language Processing

Biomedical Named Entity Recognition presents significant challenges due to the complexity of biomedical terminology and inconsistencies in annotation across datasets. This paper introduces SRU-NER (Slot-based Recurrent Unit NER), a novel approach designed to handle nested named entities while integrating multiple datasets through an effective multi-task learning strategy. SRU-NER mitigates annotation gaps by dynamically adjusting loss computation to avoid penalizing predictions of entity types absent in a given dataset. Through extensive experiments, including a cross-corpus evaluation and human assessment of the model’s predictions, SRU-NER achieves competitive performance in biomedical and general-domain NER tasks, while improving cross-domain generalization.

pdf bib abs
Explainable ICD Coding via Entity Linking
Leonor Barreiros | Isabel Coutinho | Gonçalo Correia | Bruno Martins
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

Clinical coding is a critical task in healthcare, although traditional methods for automating clinical coding may not provide sufficient explicit evidence for coders in production environments. This evidence is crucial, as medical coders have to make sure there exists at least one explicit passage in the input health record that justifies the attribution of a code. We therefore propose to reframe the task as an entity linking problem, in which each document is annotated with its set of codes and respective textual evidence, enabling better human-machine collaboration. By leveraging parameter-efficient fine-tuning of Large Language Models (LLMs), together with constrained decoding, we introduce three approaches to solve this problem that prove effective at disambiguating clinical mentions and that perform well in few-shot scenarios.

2023

pdf bib abs
Supervising the Centroid Baseline for Extractive Multi-Document Summarization
Simão Gonçalves | Gonçalo Correia | Diogo Pernes | Afonso Mendes
Proceedings of the 4th New Frontiers in Summarization Workshop

The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed. We further refine it by adding a beam search process to the sentence selection and also a centroid estimation attention model that leads to improved results. We demonstrate this in several multi-document summarization datasets, including in a multilingual scenario.

2022

DeepSPIN is a research project funded by the European Research Council (ERC) whose goal is to develop new neural structured prediction methods, models, and algorithms for improving the quality, interpretability, and data-efficiency of natural language processing (NLP) systems, with special emphasis on machine translation and quality estimation. We describe in this paper the latest findings from this project.