Warren Del-Pinto
2025
LT3: Generating Medication Prescriptions with Conditional Transformer
Samuel Belkadi
|
Nicolo Micheletti
|
Lifeng Han
|
Warren Del-Pinto
|
Goran Nenadic
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
Beyond Reconstruction: Generating Privacy-Preserving Clinical Letters
Libo Ren
|
Samuel Belkadi
|
Lifeng Han
|
Warren Del-Pinto
|
Goran Nenadic
Proceedings of the Sixth Workshop on Privacy in Natural Language Processing
Due to the sensitive nature of clinical letters, their use in model training, medical research, and education is limited. This work aims to generate diverse, de-identified, and high-quality synthetic clinical letters to enhance privacy protection. This study explores various pre-trained language models (PLMs) for text masking and generation, employing various masking strategies with a focus on Bio_ClinicalBERT. Both qualitative and quantitative methods are used for evaluation, supplemented by a downstream Named Entity Recognition (NER) task. Our results indicate that encoder-only models outperform encoder-decoder models. General-domain and clinical-domain PLMs exhibit comparable performance when clinical information is preserved. Preserving clinical entities and document structure yields better performance than fine-tuning alone. Masking stopwords enhances text quality, whereas masking nouns or verbs has a negative impact. BERTScore proves to be the most reliable quantitative evaluation metric in our task. Contextual information has minimal impact, indicating that synthetic letters can effectively replace original ones in downstream tasks. Unlike previous studies that focus primarily on reconstructing original letters or training a privacy-detection and substitution model, this project provides a framework for generating diverse clinical letters while embedding privacy detection, enabling sensitive dataset expansion and facilitating the use of real-world clinical data. Our codes and trained models will be publicly available at https://github.com/HECTA-UoM/Synthetic4Health.