Patrick Schrempf

2026

Importance of Prompt Optimisation for Error Detection in Medical Notes Using Language Models
Craig Myles | Patrick Schrempf | David Harris-Birtill
Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)

Errors in medical text can cause delays or even result in incorrect treatment for patients. Recently, language models have shown promise in their ability to automatically detect errors in medical text, an ability that has the opportunity to significantly benefit healthcare systems. In this paper, we explore the importance of prompt optimisation for small and large language models when applied to the task of error detection. We perform rigorous experiments and analysis across frontier language models and open-source language models. We show that automatic prompt optimisation with Genetic-Pareto (GEPA) improves error detection over the baseline accuracy performance from 0.669 to 0.785 with GPT-5 and 0.578 to 0.690 with Qwen3-32B, approaching the performance of medical doctors and achieving state-of-the-art performance on the MEDEC benchmark dataset. Code available on GitHub: https://github.com/CraigMyles/clinical-note-error-detection

2019

pdf bib abs

Ontological attention ensembles for capturing semantic concepts in ICD code prediction from clinical text
Matus Falis | Maciej Pajak | Aneta Lisowska | Patrick Schrempf | Lucas Deckers | Shadia Mikhael | Sotirios Tsaftaris | Alison O’Neil
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)

We present a semantically interpretable system for automated ICD coding of clinical text documents. Our contribution is an ontological attention mechanism which matches the structure of the ICD ontology, in which shared attention vectors are learned at each level of the hierarchy, and combined into label-dependent ensembles. Analysis of the attention heads shows that shared concepts are learned by the lowest common denominator node. This allows child nodes to focus on the differentiating concepts, leading to efficient learning and memory usage. Visualisation of the multi-level attention on the original text allows explanation of the code predictions according to the semantics of the ICD ontology. On the MIMIC-III dataset we achieve a 2.7% absolute (11% relative) improvement from 0.218 to 0.245 macro-F1 score compared to the previous state of the art across 3,912 codes. Finally, we analyse the labelling inconsistencies arising from different coding practices which limit performance on this task.

Co-authors

Venues

Fix author