PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent

Guangliang Liu; Zhiyu Xue; Xitong Zhang; Kristen Johnson; Rongrong Wang

doi:10.18653/v1/2023.emnlp-main.748

PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent

Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Johnson, Rongrong Wang

Abstract

Fine-tuning pretrained language models (PLMs) for downstream tasks is a large-scale optimization problem, in which the choice of the training algorithm critically determines how well the trained model can generalize to unseen test data, especially in the context of few-shot learning. To achieve good generalization performance and avoid overfitting, techniques such as data augmentation and pruning are often applied. However, adding these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, such as the popular Adam optimizer. In this paper, we propose a two-stage fine-tuning method, PAC-tuning, to address this optimization challenge. First, based on PAC-Bayes training, PAC-tuning directly minimizes the PAC-Bayes generalization bound to learn proper parameter distribution. Second, PAC-tuning modifies the gradient by injecting noise with the variance learned in the first stage into the model parameters during training, resulting in a variant of perturbed gradient descent (PGD). In the past, the few-shot scenario posed difficulties for PAC-Bayes training because the PAC-Bayes bound, when applied to large models with limited training data, might not be stringent. Our experimental results across 5 GLUE benchmark tasks demonstrate that PAC-tuning successfully handles the challenges of fine-tuning tasks and outperforms strong baseline methods by a visible margin, further confirming the potential to apply PAC training for any other settings where the Adam optimizer is currently used for training.

Anthology ID:: 2023.emnlp-main.748
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12178–12189
Language:
URL:: https://aclanthology.org/2023.emnlp-main.748
DOI:: 10.18653/v1/2023.emnlp-main.748
Bibkey:
Cite (ACL):: Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Johnson, and Rongrong Wang. 2023. PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12178–12189, Singapore. Association for Computational Linguistics.
Cite (Informal):: PAC-tuning: Fine-tuning Pre-trained Language Models with PAC-driven Perturbed Gradient Descent (Liu et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-main.748.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-5/2023.emnlp-main.748.mp4

PDF Search Video