Jean-Philippe Corbeil

Also published as: Jean-philippe Corbeil


2024

pdf
IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task on the Shoulders of Medical Agents
Jean-Philippe Corbeil
Proceedings of the 6th Clinical Natural Language Processing Workshop

In natural language processing applied to the clinical domain, utilizing large language models has emerged as a promising avenue for error detection and correction on clinical notes, a knowledge-intensive task for which annotated data is scarce. This paper presents MedReAct’N’MedReFlex, which leverages a suite of four LLM-based medical agents. The MedReAct agent initiates the process by observing, analyzing, and taking action, generating trajectories to guide the search to target a potential error in the clinical notes. Subsequently, the MedEval agent employs five evaluators to assess the targeted error and the proposed correction. In cases where MedReAct’s actions prove insufficient, the MedReFlex agent intervenes, engaging in reflective analysis and proposing alternative strategies. Finally, the MedFinalParser agent formats the final output, preserving the original style while ensuring the integrity of the error correction process. One core component of our method is our RAG pipeline based on our ClinicalCorp corpora. Among other well-known sources containing clinical guidelines and information, we preprocess and release the open-source MedWiki dataset for clinical RAG application. Our results demonstrate the central role of our RAG approach with ClinicalCorp leveraged through the MedReAct’N’MedReFlex framework. It achieved the ninth rank on the MEDIQA-CORR 2024 final leaderboard.

2023

pdf
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-michel Attendu | Jean-philippe Corbeil
Proceedings of The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP)

2021

pdf
Assessing the Eligibility of Backtranslated Samples Based on Semantic Similarity for the Paraphrase Identification Task
Jean-Philippe Corbeil | Hadi Abdi Ghavidel
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In the domain of natural language augmentation, the eligibility of generated samples remains not well understood. To gather insights around this eligibility issue, we apply a transformer-based similarity calculation within the BET framework based on backtranslation, in the context of automated paraphrase detection. While providing a rigorous statistical foundation to BET, we push their results by analyzing statistically the impacts of the level of qualification, and several sample sizes. We conducted a vast amount of experiments on the MRPC corpus using six pre-trained models: BERT, XLNet, Albert, RoBERTa, Electra, and DeBerta. We show that our method improves significantly these “base” models while using only a fraction of the corpus. Our results suggest that using some of those smaller pre-trained models, namely RoBERTa base and Electra base, helps us reach F1 scores very close to their large counterparts, as reported on the GLUE benchmark. On top of acting as a regularizer, the proposed method is efficient in dealing with data scarcity with improvements of around 3% in F1 score for most pre-trained models, and more than 7.5% in the case of Electra.