Abstract
We evaluate the effectiveness of using data augmentation to improve the generalizability of a Named Entity Recognition model for the task of medication identification in clinical notes. We compare disparate data augmentation methods, namely mention-replacement and a generative model, for creating synthetic training examples. Through experiments on the n2c2 2022 Track 1 Contextualized Medication Event Extraction data set, we show that data augmentation with supplemental examples created with GPT-3 can boost the performance of a transformer-based model for small training sets.- Anthology ID:
- 2023.ranlp-1.63
- Volume:
- Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 578–585
- Language:
- URL:
- https://aclanthology.org/2023.ranlp-1.63
- DOI:
- Cite (ACL):
- Jordan Koontz, Maite Oronoz, and Alicia Pérez. 2023. Evaluating Data Augmentation for Medication Identification in Clinical Notes. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 578–585, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Evaluating Data Augmentation for Medication Identification in Clinical Notes (Koontz et al., RANLP 2023)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2023.ranlp-1.63.pdf