Is artificial data useful for biomedical Natural Language Processing algorithms?

Zixu Wang, Julia Ive, Sumithra Velupillai, Lucia Specia


Abstract
A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility. This problem can be addressed by generating medical data artificially. Most previous studies have focused on the generation of short clinical text, and evaluation of the data utility has been limited. We propose a generic methodology to guide the generation of clinical text with key phrases. We use the artificial data as additional training data in two key biomedical NLP tasks: text classification and temporal relation extraction. We show that artificially generated training data used in conjunction with real training data can lead to performance boosts for data-greedy neural network algorithms. We also demonstrate the usefulness of the generated data for NLP setups where it fully replaces real training data.
Anthology ID:
W19-5026
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–249
Language:
URL:
https://aclanthology.org/W19-5026
DOI:
10.18653/v1/W19-5026
Bibkey:
Cite (ACL):
Zixu Wang, Julia Ive, Sumithra Velupillai, and Lucia Specia. 2019. Is artificial data useful for biomedical Natural Language Processing algorithms?. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 240–249, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Is artificial data useful for biomedical Natural Language Processing algorithms? (Wang et al., BioNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/W19-5026.pdf
Data
MIMIC-III