Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition

Rui Wang, Ricardo Henao


Abstract
Unsupervised consistency training is a way of semi-supervised learning that encourages consistency in model predictions between the original and augmented data. For Named Entity Recognition (NER), existing approaches augment the input sequence with token replacement, assuming annotations on the replaced positions unchanged. In this paper, we explore the use of paraphrasing as a more principled data augmentation scheme for NER unsupervised consistency training. Specifically, we convert Conditional Random Field (CRF) into a multi-label classification module and encourage consistency on the entity appearance between the original and paraphrased sequences. Experiments show that our method is especially effective when annotations are limited.
Anthology ID:
2021.emnlp-main.430
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5303–5308
Language:
URL:
https://aclanthology.org/2021.emnlp-main.430
DOI:
10.18653/v1/2021.emnlp-main.430
Bibkey:
Cite (ACL):
Rui Wang and Ricardo Henao. 2021. Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5303–5308, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition (Wang & Henao, EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.430.pdf