Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising

Zhu Wenjing, Liu Jian, Xu Jinan, Chen Yufeng, Zhang Yujie


Abstract
Deep neural networks have achieved state-of-the-art performances on named entity recognition(NER) with sufficient training data while they perform poorly in low-resource scenarios due to data scarcity. To solve this problem we propose a novel data augmentation method based on pre-trained language model (PLM) and curriculum learning strategy. Concretely we use the PLMto generate diverse training instances through predicting different masked words and design atask-specific curriculum learning strategy to alleviate the influence of noises. We evaluate the effectiveness of our approach on three datasets: CoNLL-2003 OntoNotes5.0 and MaScip of which the first two are simulated low-resource scenarios and the last one is a real low-resource dataset in material science domain. Experimental results show that our method consistently outperform the baseline model. Specifically our method achieves an absolute improvement of3.46% F1 score on the 1% CoNLL-2003 2.58% on the 1% OntoNotes5.0 and 0.99% on the full of MaScip.
Anthology ID:
2021.ccl-1.101
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Editors:
Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
1131–1142
Language:
English
URL:
https://preview.aclanthology.org/icon-24-ingestion/2021.ccl-1.101/
DOI:
Bibkey:
Cite (ACL):
Zhu Wenjing, Liu Jian, Xu Jinan, Chen Yufeng, and Zhang Yujie. 2021. Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1131–1142, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
Improving Low-Resource Named Entity Recognition via Label-Aware Data Augmentation and Curriculum Denoising (Wenjing et al., CCL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2021.ccl-1.101.pdf
Data
OntoNotes 5.0