Detecting Omissions in LLM-Generated Medical Summaries

Achir Oukelmoun; Nasredine Semmar; Gaël de Chalendar; Clément Cormi; Mariame Oukelmoun; Eric Vibert; Marc-Antoine Allard

Detecting Omissions in LLM-Generated Medical Summaries

Achir Oukelmoun, Nasredine Semmar, Gaël de Chalendar, Clement Cormi, Mariame Oukelmoun, Eric Vibert, Marc-Antoine Allard

Abstract

With the emergence of Large Language Models (LLMs), numerous use cases have arisen in the medical field, particularly in generating summaries for consultation transcriptions and extensive medical reports. A major concern is that these summaries may omit critical information from the original input, potentially jeopardizing the decision-making process. This issue of omission is distinct from hallucination, which involves generating incorrect or fabricated facts. To address omissions, this paper introduces a dataset designed to evaluate such issues and proposes a frugal approach called EmbedKDECheck for detecting omissions in LLM-generated texts. The dataset, created in French, has been validated by medical experts to ensure it accurately represents real-world scenarios in the medical field. The objective is to develop a reference-free (black-box) method that can evaluate the reliability of summaries or reports without requiring significant computational resources, relying only on input and output. Unlike methods that rely on embeddings derived from the LLM itself, our approach uses embeddings generated by a third-party, lightweight NLP model based on a combination of FastText and Word2Vec. These embeddings are then combined with anomaly detection models to identify omissions effectively, making the method well-suited for resource-constrained environments. EmbedKDECheck was benchmarked against black-box state-of-the-art frameworks and models, including SelfCheckGPT, ChainPoll, and G-Eval, which leverage GPT. Results demonstrated its satisfactory performance in detecting omissions in LLM-generated summaries. This work advances frugal methodologies for evaluating the reliability of LLM-generated texts, with significant potential to improve the safety and accuracy of medical decision support systems in surgery and other healthcare domains.

Anthology ID:: 2025.emnlp-industry.22
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 325–337
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.22/
DOI:
Bibkey:
Cite (ACL):: Achir Oukelmoun, Nasredine Semmar, Gaël de Chalendar, Clement Cormi, Mariame Oukelmoun, Eric Vibert, and Marc-Antoine Allard. 2025. Detecting Omissions in LLM-Generated Medical Summaries. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 325–337, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: Detecting Omissions in LLM-Generated Medical Summaries (Oukelmoun et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.22.pdf

PDF Cite Search Fix data