Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Zhu Wenjing; Liu Jian; Xu Jinan (徐金安); Chen Yufeng (陈钰枫); Zhang Yujie

Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Zhu Wenjing, Liu Jian, Xu Jinan, Chen Yufeng, Zhang Yujie

Abstract

Deep neural networks have achieved state-of-the-art performances on named entity recognition(NER) with sufficient training data while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely we use the PLMto generate diverse training instances through predicting different masked words and design atask-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003 OntoNotes5.0 and MaScip of which the first two are simulated low-resource scenarios and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically our method achieves an absolute improvement of3.46% F1 score on the 1% CoNLL-2003 2.58% on the 1% OntoNotes5.0 and 0.99% on the full of MaScip.

Anthology ID:: 2021.ccl-1.101
Volume:: Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:: August
Year:: 2021
Address:: Huhhot, China
Editors:: Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 1131–1142
Language:: English
URL:: https://preview.aclanthology.org/fix-sig-urls/2021.ccl-1.101/
DOI:
Bibkey:
Cite (ACL):: Zhu Wenjing, Liu Jian, Xu Jinan, Chen Yufeng, and Zhang Yujie. 2021. Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1131–1142, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):: Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising (Wenjing et al., CCL 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2021.ccl-1.101.pdf
Data: OntoNotes 5.0

PDF Cite Search Fix data