Joel Winston


2025

pdf bib
Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records
Ben Holgate | Joe Davies | Shichao Fang | Joel Winston | James Teo | Mark Richardson
Proceedings of the 24th Workshop on Biomedical Language Processing

We developed a new methodology of extracting the frequency of a patient’s epilepsy seizures from unstructured, free-text outpatient clinic letters by: first, devising a singular unit of measurement for seizure frequency; and second, fine-tuning a generative Large Language Model (LLM) on our bespoke annotated dataset. We measured frequency by the number of seizures per month: one seizure or more requires an integer; and less than one a decimal. This approach enables us to track whether a patient”s seizures are improving or not over time. We found fine-tuning improves the F1 score of our best-performing LLM, Ministral-8B-Instruct-2410, by around three times compared to an untrained model. We also found Ministral demonstrated an impressive ability for mathematical reasoning.