Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection

Bosung Kim, Ndapa Nakashole


Abstract
We study the problem of entity detection and normalization applied to patient self-reports of symptoms that arise as side-effects of vaccines. Our application domain presents unique challenges that render traditional classification methods ineffective: the number of entity types is large; and many symptoms are rare, resulting in a long-tail distribution of training examples per entity type. We tackle these challenges with an autoregressive model that generates standardized names of symptoms. We introduce a data augmentation technique to increase the number of training examples for rare symptoms. Experiments on real-life patient vaccine symptom self-reports show that our approach outperforms strong baselines, and that additional examples improve performance on the long-tail entities.
Anthology ID:
2022.bionlp-1.29
Volume:
Proceedings of the 21st Workshop on Biomedical Language Processing
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
310–315
Language:
URL:
https://aclanthology.org/2022.bionlp-1.29
DOI:
10.18653/v1/2022.bionlp-1.29
Bibkey:
Cite (ACL):
Bosung Kim and Ndapa Nakashole. 2022. Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 310–315, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection (Kim & Nakashole, BioNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2022.bionlp-1.29.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-3/2022.bionlp-1.29.mp4