Clément Cormi
Also published as: Clement Cormi
2025
Detecting Omissions in LLM-Generated Medical Summaries
Achir Oukelmoun
|
Nasredine Semmar
|
Gaël de Chalendar
|
Clement Cormi
|
Mariame Oukelmoun
|
Eric Vibert
|
Marc-Antoine Allard
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
With the emergence of Large Language Models (LLMs), numerous use cases have arisen in the medical field, particularly in generating summaries for consultation transcriptions and extensive medical reports. A major concern is that these summaries may omit critical information from the original input, potentially jeopardizing the decision-making process. This issue of omission is distinct from hallucination, which involves generating incorrect or fabricated facts. To address omissions, this paper introduces a dataset designed to evaluate such issues and proposes a frugal approach called EmbedKDECheck for detecting omissions in LLM-generated texts. The dataset, created in French, has been validated by medical experts to ensure it accurately represents real-world scenarios in the medical field. The objective is to develop a reference-free (black-box) method that can evaluate the reliability of summaries or reports without requiring significant computational resources, relying only on input and output. Unlike methods that rely on embeddings derived from the LLM itself, our approach uses embeddings generated by a third-party, lightweight NLP model based on a combination of FastText and Word2Vec. These embeddings are then combined with anomaly detection models to identify omissions effectively, making the method well-suited for resource-constrained environments. EmbedKDECheck was benchmarked against black-box state-of-the-art frameworks and models, including SelfCheckGPT, ChainPoll, and G-Eval, which leverage GPT. Results demonstrated its satisfactory performance in detecting omissions in LLM-generated summaries. This work advances frugal methodologies for evaluating the reliability of LLM-generated texts, with significant potential to improve the safety and accuracy of medical decision support systems in surgery and other healthcare domains.
Détection des omissions dans les résumés médicaux générés par les grands modèles de langue
Achir Oukelmoun
|
Nasredine Semmar
|
Gaël de Chalendar
|
Clément Cormi
|
Mariame Oukelmoun
|
Eric Vibert
|
Marc-Antoine Allard
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux
Les grands modèles de langue (LLMs) sont de plus en plus utilisés pour résumer des textes médicaux, mais ils risquent d’omettre des informations critiques, compromettant ainsi la prise de décision. Contrairement aux hallucinations, les omissions concernent des faits essentiels absents. Cet article introduit un jeu de données validé en français pour détecter ces omissions et propose EmbedKDECheck, une approche frugale et sans référence. A l’opposé des méthodes basées sur les LLMs, cette approche utilise des plongements lexicaux issus d’un modèle de Traitement Automatique des Langues (TAL) léger combinant FastText et Word2Vec selon un algorithme précis couplé à un modèle non-supervisé fournissant un score d’anomalie. Cette approche permet d’identifier efficacement les omissions à faible coût computationnel. EmbedKDECheck a été évalué face aux frameworks de l’état de l’art (SelfCheckGPT, ChainPoll, G-Eval et GPTScore) et a montré de bonnes performances. Notre méthode renforce l’évaluation de la fiabilité des LLMs et contribue à une prise de décision médicale plus sûre.
Search
Fix author
Co-authors
- Marc-Antoine Allard 2
- Achir Oukelmoun 2
- Mariame Oukelmoun 2
- Nasredine Semmar 2
- Eric Vibert 2
- show all...