Using Synthetic Records to Improve Automated Identification of Seizure Freedom in Clinical Text about People with Epilepsy

Stephen Barlow, Yujian Gan, Joe Davies, Joel Winston, James Teo, Mark Richardson, Ben Holgate


Abstract
Seizure freedom is a key clinical outcome for people with epilepsy (PWE) yet it is primarily recorded in free-text notes and letters in the United Kingdom, making it difficult to aggregate and track at scale. This paper introduces a generative LLM-based pipeline boosted by synthetic data to identify a PWE’s seizure freedom status in clinicians’ records. We fine-tuned seven different LLMs with between 4-14 billion parameters using LoRA to compare models trained on synthetic records against those trained on expert annotated records. The best performing configuration, based on Qwen-2.5-14B, was trained entirely on synthetic records and used chain-of-thought (CoT) reasoning (both generated by GPT-5). This achieved an F1 score of 0.90±0.02 on double-annotated test data and outperformed the equivalent model trained on authentic clinician records, which achieved 0.87±0.04. The synthetically trained models also have the benefit of outputting their CoT reasoning process for greater decision-making transparency and can also make use of the unused supervised training data for significantly increased test examples. This work has implications for monitoring a key treatment outcome for PWE automatically and at scale.
Anthology ID:
2026.bionlp-1.3
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–30
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.3/
DOI:
Bibkey:
Cite (ACL):
Stephen Barlow, Yujian Gan, Joe Davies, Joel Winston, James Teo, Mark Richardson, and Ben Holgate. 2026. Using Synthetic Records to Improve Automated Identification of Seizure Freedom in Clinical Text about People with Epilepsy. In BioNLP 2026, pages 20–30, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Using Synthetic Records to Improve Automated Identification of Seizure Freedom in Clinical Text about People with Epilepsy (Barlow et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.3.pdf