Abstract
Fine-tuning pre-trained language models suchas BERT has become a common practice dom-inating leaderboards across various NLP tasks. Despite its recent success and wide adoption,this process is unstable when there are onlya small number of training samples available. The brittleness of this process is often reflectedby the sensitivity to random seeds. In this pa-per, we propose to tackle this problem basedon the noise stability property of deep nets,which is investigated in recent literature (Aroraet al., 2018; Sanyal et al., 2020). Specifically,we introduce a novel and effective regulariza-tion method to improve fine-tuning on NLPtasks, referred to asLayer-wiseNoiseStabilityRegularization (LNSR). We extend the theo-ries about adding noise to the input and provethat our method gives a stabler regularizationeffect. We provide supportive evidence by ex-perimentally confirming that well-performingmodels show a low sensitivity to noise andfine-tuning with LNSR exhibits clearly bet-ter generalizability and stability. Furthermore,our method also demonstrates advantages overother state-of-the-art algorithms including L2-SP (Li et al., 2018), Mixout (Lee et al., 2020)and SMART (Jiang et al., 20)- Anthology ID:
- 2021.naacl-main.258
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3229–3241
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.258
- DOI:
- 10.18653/v1/2021.naacl-main.258
- Cite (ACL):
- Hang Hua, Xingjian Li, Dejing Dou, Chengzhong Xu, and Jiebo Luo. 2021. Noise Stability Regularization for Improving BERT Fine-tuning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3229–3241, Online. Association for Computational Linguistics.
- Cite (Informal):
- Noise Stability Regularization for Improving BERT Fine-tuning (Hua et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2021.naacl-main.258.pdf
- Data
- CoLA, GLUE, MRPC