Abstract
Data augmentation is important in addressing data sparsity and low resources in NLP. Unlike data augmentation for other tasks such as sentence-level and sentence-pair ones, data augmentation for named entity recognition (NER) requires preserving the semantic of entities. To that end, in this paper we propose a simple semantic-based data augmentation method for biomedical NER. Our method leverages semantic information from pre-trained language models for both entity-level and sentence-level. Experimental results on two datasets: i2b2-2010 (English) and VietBioNER (Vietnamese) showed that the proposed method could improve NER performance.- Anthology ID:
- 2022.bionlp-1.12
- Volume:
- Proceedings of the 21st Workshop on Biomedical Language Processing
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 123–129
- Language:
- URL:
- https://aclanthology.org/2022.bionlp-1.12
- DOI:
- 10.18653/v1/2022.bionlp-1.12
- Cite (ACL):
- Uyen Phan and Nhung Nguyen. 2022. Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 123–129, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts (Phan & Nguyen, BioNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2022.bionlp-1.12.pdf