Pruning General Large Language Models into Customized Expert Models

Yiran Zhao; Guizhen Chen; Kenji Kawaguchi; Lidong Bing; Wenxuan Zhang

Pruning General Large Language Models into Customized Expert Models

Yiran Zhao, Guizhen Chen, Kenji Kawaguchi, Lidong Bing, Wenxuan Zhang

Abstract

Large Language Models (LLMs) have transformed natural language processing, yet their substantial model sizes often demand significant computational resources. To preserve computing resources and accelerate inference speed, it is crucial to prune redundant parameters, especially for experienced users who often need expert models tailored to specific downstream scenarios. However, current pruning methods primarily focus on maintaining models’ general capabilities, either requiring extensive post-training or performing poorly due to coarse-grained pruning. In this work, we design a ̲Custom ̲Pruning method (Cus-Prun) to prune a large general model into a smaller lightweight expert model, which is positioned along the “language”, “domain” and “task” dimensions. By identifying and pruning irrelevant neurons of each dimension, Cus-Prun creates expert models without any post-training. Our experiments demonstrate that Cus-Prun consistently outperforms other methods, achieving minimal loss in both expert and general capabilities across various models from different model families and sizes.

Anthology ID:: 2025.findings-acl.1201
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23377–23391
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.1201/
DOI:
Bibkey:
Cite (ACL):: Yiran Zhao, Guizhen Chen, Kenji Kawaguchi, Lidong Bing, and Wenxuan Zhang. 2025. Pruning General Large Language Models into Customized Expert Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 23377–23391, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Pruning General Large Language Models into Customized Expert Models (Zhao et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.1201.pdf

PDF Cite Search Fix data