Abstract
Structured pruning is an effective technique for compressing pre-trained language models (PLMs), reducing model size and improving inference speed for efficient deployment. However, most of existing pruning algorithms require retraining, leading to additional computational overhead. While some retraining-free approaches have been proposed for classification tasks, they still require a fully fine-tuned model for the task, and may cause catastrophic performance degradation on generative tasks. To address these challenges, we propose P-pruning (pre-pruning), an innovative task-specific compression framework. P-pruning prunes redundant modules of PLMs before fine-tuning, reducing the costs associated with fine-tuning. We also introduce a pruning algorithm for this framework, which includes two techniques: (1) module clustering, which clusters the outputs of all heads and neurons based on the task input; and (2) centroid selection, which identifies the most salient element in each cluster and prunes the others. We apply our method to BERT and GPT-2 and evaluate its effectiveness on GLUE, SQuAD, WikiText-2, WikiText-103, and PTB datasets. Experimental results demonstrate that our approach achieves higher performance in both classification and generative tasks, while also reducing the time required for fine-tuning.- Anthology ID:
- 2024.lrec-main.1162
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 13279–13289
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2024.lrec-main.1162/
- DOI:
- Cite (ACL):
- Pingjie Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13279–13289, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models (Wang et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2024.lrec-main.1162.pdf