Ben Holgate
2026
Using Synthetic Records to Improve Automated Identification of Seizure Freedom in Clinical Text about People with Epilepsy
Stephen Barlow | Yujian Gan | Joe Davies | Joel Winston | James Teo | Mark Richardson | Ben Holgate
BioNLP 2026
Stephen Barlow | Yujian Gan | Joe Davies | Joel Winston | James Teo | Mark Richardson | Ben Holgate
BioNLP 2026
Seizure freedom is a key clinical outcome for people with epilepsy (PWE) yet it is primarily recorded in free-text notes and letters in the United Kingdom, making it difficult to aggregate and track at scale. This paper introduces a generative LLM-based pipeline boosted by synthetic data to identify a PWE’s seizure freedom status in clinicians’ records. We fine-tuned seven different LLMs with between 4-14 billion parameters using LoRA to compare models trained on synthetic records against those trained on expert annotated records. The best performing configuration, based on Qwen-2.5-14B, was trained entirely on synthetic records and used chain-of-thought (CoT) reasoning (both generated by GPT-5). This achieved an F1 score of 0.90±0.02 on double-annotated test data and outperformed the equivalent model trained on authentic clinician records, which achieved 0.87±0.04. The synthetically trained models also have the benefit of outputting their CoT reasoning process for greater decision-making transparency and can also make use of the unused supervised training data for significantly increased test examples. This work has implications for monitoring a key treatment outcome for PWE automatically and at scale.
2025
Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records
Ben Holgate | Joe Davies | Shichao Fang | Joel Winston | James Teo | Mark Richardson
Proceedings of the 24th Workshop on Biomedical Language Processing
Ben Holgate | Joe Davies | Shichao Fang | Joel Winston | James Teo | Mark Richardson
Proceedings of the 24th Workshop on Biomedical Language Processing
We developed a new methodology of extracting the frequency of a patient’s epilepsy seizures from unstructured, free-text outpatient clinic letters by: first, devising a singular unit of measurement for seizure frequency; and second, fine-tuning a generative Large Language Model (LLM) on our bespoke annotated dataset. We measured frequency by the number of seizures per month: one seizure or more requires an integer; and less than one a decimal. This approach enables us to track whether a patient”s seizures are improving or not over time. We found fine-tuning improves the F1 score of our best-performing LLM, Ministral-8B-Instruct-2410, by around three times compared to an untrained model. We also found Ministral demonstrated an impressive ability for mathematical reasoning.
2024
Extracting Epilepsy Patient Data with Llama 2
Ben Holgate | Shichao Fang | Anthony Shek | Matthew McWilliam | Pedro Viana | Joel S. Winston | James T. Teo | Mark P. Richardson
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Ben Holgate | Shichao Fang | Anthony Shek | Matthew McWilliam | Pedro Viana | Joel S. Winston | James T. Teo | Mark P. Richardson
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
We fill a gap in scholarship by applying a generative Large Language Model (LLM) to extract information from clinical free text about the frequency of seizures experienced by people with epilepsy. Seizure frequency is difficult to determine across time from unstructured doctors’ and nurses’ reports of outpatients’ visits that are stored in Electronic Health Records (EHRs) in the United Kingdom’s National Health Service (NHS). We employ Meta’s Llama 2 to mine the EHRs of people with epilepsy and determine, where possible, a person’s seizure frequency at a given point in time. The results demonstrate that the new, powerful generative LLMs may improve outcomes for clinical NLP research in epilepsy and other areas.