A Cheap Lunch: Synthetic Annotation With Reduced Human Effort for Medical Text Mining

Shutao Chen, Piek T.J.M. Vossen


Abstract
Electronic Health Records are rich resources of patient knowledge and information among which knowledge about the functioning of patients as defined in the International Classification of Functioning (ICF) by the WHO. However, the patient notes have yet to be explored as the knowledge is packaged in sometimes cryptic language exchanged between caretakers. Recent research started to use NLP techniques to extract this knowledge but often requires laborious annotation. In this paper, we report on how the annotation can (partly) be done by a generative LLM, both for ICF categories that were previously manually annotated and for new ICF categories for which there was no annotation. We show that a domain specific encoder finetuned with both manual and synthetic annotations outperforms finetuning with just the manual annotations on a dedicated test set that was adapted for the new categories with minimal manual effort. We also assessed the quality of the synthetic annotations of the training data. Our process shows how competitive text classifiers for medical text mining can be developed and extended to new categories with minimal manual effort by experts.
Anthology ID:
2026.lrec-main.813
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
10353–10364
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.813/
DOI:
Bibkey:
Cite (ACL):
Shutao Chen and Piek T.J.M. Vossen. 2026. A Cheap Lunch: Synthetic Annotation With Reduced Human Effort for Medical Text Mining. International Conference on Language Resources and Evaluation, main:10353–10364.
Cite (Informal):
A Cheap Lunch: Synthetic Annotation With Reduced Human Effort for Medical Text Mining (Chen & Vossen, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.813.pdf