FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing

James Seale Smith; Chi-Heng Lin; Shikhar Tuli; Haris Jeelani; Shangqian Gao; Yilin Shen; Hongxia Jin; Yen-Chang Hsu

FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing

James Seale Smith, Chi-Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

Abstract

The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We present a method to prune LLMs that selectively prunes model blocks based on an importance score and replaces them with a low-parameter replacement strategy. Specifically, we propose a principled metric to replace each pruned block using a weight-sharing mechanism that leverages unpruned counterparts from the model and block-specific low-rank adapters. Furthermore, we facilitate the learning of these replacement blocks with output feature normalization and an adapter initialization scheme built on low-rank SVD reconstructions. Empirical evaluations demonstrate substantial performance gains over existing methods, achieving state-of-the-art performance on 5/6 benchmarks for a compression rate of 30% and 6/6 benchmarks for a compression rate of 40%. We also demonstrate that our approach can extend smaller models, boosting performance on 6/6 benchmarks using only ~0.3% tokens of extended training with minimal additional parameter costs.

Anthology ID:: 2025.naacl-long.31
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 718–730
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.31/
DOI:
Bibkey:
Cite (ACL):: James Seale Smith, Chi-Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, and Yen-Chang Hsu. 2025. FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 718–730, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing (Smith et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.31.pdf

PDF Cite Search Fix data