Miguel Couceiro

2025

pdf bib abs
Comparing representations of long clinical texts for the task of patient-note identification
Safa Alsaidi | Marc Vincent | Olivia Boyer | Nicolas Garcelon | Miguel Couceiro | Adrien Coulet
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)

In this paper, we address the challenge of patient-note identification, which involves accurately matching an anonymized clinical note to its corresponding patient, represented by a set of related notes. This task has broad applications, including duplicate records detection and patient similarity analysis, which require robust patient-level representations. We explore various embedding methods, including Hierarchical Attention Networks (HAN), three-level Hierarchical Transformer Networks (HTN), LongFormer, and advanced BERT-based models, focusing on their ability to process medium-to-long clinical texts effectively. Additionally, we evaluate different pooling strategies (mean, max, and mean_max) for aggregating word-level embeddings into patient-level representations and we examine the impact of sliding windows on model performance. Our results indicate that BERT-based embeddings outperform traditional and hierarchical models, particularly in processing lengthy clinical notes and capturing nuanced patient representations. Among the pooling strategies, mean_max pooling consistently yields the best results, highlighting its ability to capture critical features from clinical notes. Furthermore, the reproduction of our results on both MIMIC dataset and Necker hospital data warehouse illustrates the generalizability of these approaches to real-world applications, emphasizing the importance of both embedding methods and aggregation strategies in optimizing patient-note identification and enhancing patient-level modeling.

pdf bib abs
EDAR: A pipeline for Emotion and Dialogue Act Recognition
Elie Dina | Rania Ayachi Kibech | Miguel Couceiro
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Individuals facing financial difficulties often make decisions driven by emotions rather than rational analysis. EDAR, a pipeline for Emotion and Dialogue Act Recognition, is designed specifically for the debt collection process in France. By integrating EDAR into decision-making systems, debt collection outcomes could be improved. The pipeline employs Machine Learning and Deep Learning models, demonstrating that smaller models with fewer parameters can achieve high performance, offering an efficient alternative to large language models.

pdf bib abs
Exploring ReAct Prompting for Task-Oriented Dialogue: Insights and Shortcomings
Michelle Elizabeth | Morgan Veyret | Miguel Couceiro | Ondrej Dusek | Lina M. Rojas Barahona
Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology

Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) (Yao et al., 2022) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing task-oriented dialogue (TOD). We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs severely underperform state-of-the-art approaches on success rate in simulation, this difference becomes less pronounced in human evaluation. Moreover, compared to the baseline, humans report higher subjective satisfaction with ReAct-LLM despite its lower success rate, most likely thanks to its natural and confidently phrased responses.

2024

pdf bib abs
The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese
Ajinkya Kulkarni | Anna Tokareva | Rameez Qureshi | Miguel Couceiro
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

In the field of spoken language understanding, systems like Whisper and Multilingual Massive Speech (MMS) have shown state-of-the-art performances. This study is dedicated to a comprehensive exploration of the Whisper and MMS systems, with a focus on assessing biases in automatic speech recognition (ASR) inherent to casual conversation speech specific to the Portuguese language. Our investigation encompasses various categories, including gender, age, skin tone color, and geo-location. Alongside traditional ASR evaluation metrics such as Word Error Rate (WER), we have incorporated p-value statistical significance for gender bias analysis. Furthermore, we extensively examine the impact of data distribution and empirically show that oversampling techniques alleviate such stereotypical biases. This research represents a pioneering effort in quantifying biases in the Portuguese language context through the application of MMS and Whisper, contributing to a better understanding of ASR systems’ performance in multilingual settings.

2021

pdf bib abs
GECko+: a Grammatical and Discourse Error Correction Tool
Eduardo Calò | Léo Jacqmin | Thibo Rosemplatt | Maxime Amblard | Miguel Couceiro | Ajinkya Kulkarni
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 : Démonstrations

GECko+ : a Grammatical and Discourse Error Correction Tool We introduce GECko+, a web-based writing assistance tool for English that corrects errors both at the sentence and at the discourse level. It is based on two state-of-the-art models for grammar error correction and sentence ordering. GECko+ is available online as a web application that implements a pipeline combining the two models.

pdf bib abs
A New Broad NLP Training from Speech to Knowledge
Maxime Amblard | Miguel Couceiro
Proceedings of the Fifth Workshop on Teaching NLP

In 2018, the Master Sc. in NLP opened at IDMC - Institut des Sciences du Digital, du Management et de la Cognition, Université de Lorraine - Nancy, France. Far from being a creation ex-nihilo, it is the product of a history and many reflections on the field and its teaching. This article proposes epistemological and critical elements on the opening and maintainance of this so far new master’s program in NLP.