SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels

Zhenting Qi; Xiaoyu Tan; Chao Qu; Yinghui Xu; Yuan Qi

doi:10.18653/v1/2023.acl-industry.38

SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels

Zhenting Qi, Xiaoyu Tan, Chao Qu, Yinghui Xu, Yuan Qi

Abstract

Learning on noisy datasets is a challenging problem when pre-trained language models are applied to real-world text classification tasks. In numerous industrial applications, acquiring task-specific datasets with 100% accurate labels is difficult, thus many datasets are accompanied by label noise at different levels. Previous work has shown that existing noise-handling methods could not improve the peak performance of BERT on noisy datasets, and might even deteriorate it. In this paper, we propose SaFER, a robust and efficient fine-tuning framework for BERT-based text classifiers, combating label noises without access to any clean data for training or validation. Utilizing a label-agnostic early-stopping strategy and self-supervised learning, our proposed framework achieves superior performance in terms of both accuracy and speed on multiple text classification benchmarks. The trained model is finally fully deployed in several industrial biomedical literature mining tasks and demonstrates high effectiveness and efficiency.

Anthology ID:: 2023.acl-industry.38
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Sunayana Sitaram, Beata Beigman Klebanov, Jason D Williams
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 390–403
Language:
URL:: https://aclanthology.org/2023.acl-industry.38
DOI:: 10.18653/v1/2023.acl-industry.38
Bibkey:
Cite (ACL):: Zhenting Qi, Xiaoyu Tan, Chao Qu, Yinghui Xu, and Yuan Qi. 2023. SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 390–403, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: SaFER: A Robust and Efficient Framework for Fine-tuning BERT-based Classifier with Noisy Labels (Qi et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2023.acl-industry.38.pdf

PDF Search