Abstract
Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.- Anthology ID:
- 2020.coling-main.343
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3861–3867
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.343
- DOI:
- 10.18653/v1/2020.coling-main.343
- Cite (ACL):
- Xiang Dai and Heike Adel. 2020. An Analysis of Simple Data Augmentation for Named Entity Recognition. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3861–3867, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- An Analysis of Simple Data Augmentation for Named Entity Recognition (Dai & Adel, COLING 2020)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/2020.coling-main.343.pdf
- Code
- additional community code