Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models

Pingjie Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang


Abstract
Structured pruning is an effective technique for compressing pre-trained language models (PLMs), reducing model size and improving inference speed for efficient deployment. However, most of existing pruning algorithms require retraining, leading to additional computational overhead. While some retraining-free approaches have been proposed for classification tasks, they still require a fully fine-tuned model for the task, and may cause catastrophic performance degradation on generative tasks. To address these challenges, we propose P-pruning (pre-pruning), an innovative task-specific compression framework. P-pruning prunes redundant modules of PLMs before fine-tuning, reducing the costs associated with fine-tuning. We also introduce a pruning algorithm for this framework, which includes two techniques: (1) module clustering, which clusters the outputs of all heads and neurons based on the task input; and (2) centroid selection, which identifies the most salient element in each cluster and prunes the others. We apply our method to BERT and GPT-2 and evaluate its effectiveness on GLUE, SQuAD, WikiText-2, WikiText-103, and PTB datasets. Experimental results demonstrate that our approach achieves higher performance in both classification and generative tasks, while also reducing the time required for fine-tuning.
Anthology ID:
2024.lrec-main.1162
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
13279–13289
Language:
URL:
https://aclanthology.org/2024.lrec-main.1162
DOI:
Bibkey:
Cite (ACL):
Pingjie Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13279–13289, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models (Wang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1162.pdf