Adaptive Textual Label Noise Learning based on Pre-trained Models

Shaohuan Cheng, Wenyu Chen, Fu Mingsheng, Xuanting Xie, Hong Qu


Abstract
The label noise in real-world scenarios is unpredictable and can even be a mixture of different types of noise. To meet this challenge, we develop an adaptive textual label noise learning framework based on pre-trained models, which consists of an adaptive warm-up stage and a hybrid training stage. Specifically, an early stopping method, relying solely on the training set, is designed to dynamically terminate the warm-up process based on the model’s fit level to different noise scenarios. The hybrid training stage incorporates several generalization strategies to gradually correct mislabeled instances, thereby making better use of noisy data. Experiments on multiple datasets demonstrate that our approach performs comparably or even surpasses the state-of-the-art methods in various noise scenarios, including scenarios with the mixture of multiple types of noise.
Anthology ID:
2023.findings-emnlp.209
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3174–3188
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.209
DOI:
10.18653/v1/2023.findings-emnlp.209
Bibkey:
Cite (ACL):
Shaohuan Cheng, Wenyu Chen, Fu Mingsheng, Xuanting Xie, and Hong Qu. 2023. Adaptive Textual Label Noise Learning based on Pre-trained Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3174–3188, Singapore. Association for Computational Linguistics.
Cite (Informal):
Adaptive Textual Label Noise Learning based on Pre-trained Models (Cheng et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.findings-emnlp.209.pdf