Abstract
Deep neural networks have achieved state-of-the-art performances on named entity recognition(NER) with sufficient training data while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely we use the PLMto generate diverse training instances through predicting different masked words and design atask-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003 OntoNotes5.0 and MaScip of which the first two are simulated low-resource scenarios and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically our method achieves an absolute improvement of3.46% F1 score on the 1% CoNLL-2003 2.58% on the 1% OntoNotes5.0 and 0.99% on the full of MaScip.- Anthology ID:
- 2021.ccl-1.101
- Volume:
- Proceedings of the 20th Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2021
- Address:
- Huhhot, China
- Editors:
- Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 1131–1142
- Language:
- English
- URL:
- https://aclanthology.org/2021.ccl-1.101
- DOI:
- Cite (ACL):
- Zhu Wenjing, Liu Jian, Xu Jinan, Chen Yufeng, and Zhang Yujie. 2021. Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1131–1142, Huhhot, China. Chinese Information Processing Society of China.
- Cite (Informal):
- Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising (Wenjing et al., CCL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.ccl-1.101.pdf
- Data
- OntoNotes 5.0