Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French

Nicolas Hiebel, Olivier Ferret, Karen Fort, Aurélie Névéol


Abstract
In sensitive domains, the sharing of corpora is restricted due to confidentiality, copyrights or trade secrets. Automatic text generation can help alleviate these issues by producing synthetic texts that mimic the linguistic properties of real documents while preserving confidentiality. In this study, we assess the usability of synthetic corpus as a substitute training corpus for clinical information extraction. Our goal is to automatically produce a clinical case corpus annotated with clinical entities and to evaluate it for a named entity recognition (NER) task. We use two auto-regressive neural models partially or fully trained on generic French texts and fine-tuned on clinical cases to produce a corpus of synthetic clinical cases. We study variants of the generation process: (i) fine-tuning on annotated vs. plain text (in that case, annotations are obtained a posteriori) and (ii) selection of generated texts based on models parameters and filtering criteria. We then train NER models with the resulting synthetic text and evaluate them on a gold standard clinical corpus. Our experiments suggest that synthetic text is useful for clinical NER.
Anthology ID:
2023.eacl-main.170
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2312–2330
Language:
URL:
https://aclanthology.org/2023.eacl-main.170
DOI:
Bibkey:
Cite (ACL):
Nicolas Hiebel, Olivier Ferret, Karen Fort, and Aurélie Névéol. 2023. Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2312–2330, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Can Synthetic Text Help Clinical Named Entity Recognition? A Study of Electronic Health Records in French (Hiebel et al., EACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-url/2023.eacl-main.170.pdf