Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction

Simon Meoni, Eric De la Clergerie, Theo Ryffel


Abstract
In clinical and other specialized domains, data are scarce due to their confidential nature. This lack of data is a major problem when fine-tuning language models. Nevertheless, very large language models (LLMs) are promising for the medical domain but cannot be used directly in healthcare facilities due to data confidentiality issues. We explore an approach of annotating training data with LLMs to train smaller models more adapted to our problem. We show that this method yields promising results for information extraction tasks.
Anthology ID:
2023.bionlp-1.15
Volume:
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Dina Demner-fushman, Sophia Ananiadou, Kevin Cohen
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
178–190
Language:
URL:
https://aclanthology.org/2023.bionlp-1.15
DOI:
10.18653/v1/2023.bionlp-1.15
Bibkey:
Cite (ACL):
Simon Meoni, Eric De la Clergerie, and Theo Ryffel. 2023. Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 178–190, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Large Language Models as Instructors: A Study on Multilingual Clinical Entity Extraction (Meoni et al., BioNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2023.bionlp-1.15.pdf