Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records

Ben Holgate, Joe Davies, Shichao Fang, Joel Winston, James Teo, Mark Richardson


Abstract
We developed a new methodology of extracting the frequency of a patient’s epilepsy seizures from unstructured, free-text outpatient clinic letters by: first, devising a singular unit of measurement for seizure frequency; and second, fine-tuning a generative Large Language Model (LLM) on our bespoke annotated dataset. We measured frequency by the number of seizures per month: one seizure or more requires an integer; and less than one a decimal. This approach enables us to track whether a patient”s seizures are improving or not over time. We found fine-tuning improves the F1 score of our best-performing LLM, Ministral-8B-Instruct-2410, by around three times compared to an untrained model. We also found Ministral demonstrated an impressive ability for mathematical reasoning.
Anthology ID:
2025.bionlp-1.5
Volume:
Proceedings of the 24th Workshop on Biomedical Language Processing
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44–55
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.5/
DOI:
Bibkey:
Cite (ACL):
Ben Holgate, Joe Davies, Shichao Fang, Joel Winston, James Teo, and Mark Richardson. 2025. Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records. In Proceedings of the 24th Workshop on Biomedical Language Processing, pages 44–55, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records (Holgate et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.5.pdf
Supplementarymaterial:
 2025.bionlp-1.5.SupplementaryMaterial.txt