Panxuyan
2026
LaCo: Layer-wise Compensation for Pruned Large Language Models
Yingen Liu | Fan Wu | Panxuyan | Ruihui Li | Zhuo Tang | Kenli Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yingen Liu | Fan Wu | Panxuyan | Ruihui Li | Zhuo Tang | Kenli Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pruning is essential for the efficient deployment of Large Language Models (LLMs); however, it causes severe performance degradation due to the structural distortion induced by sparsity.Existing recovery strategies, such as LoRA, predominantly employ global fine-tuning, often overlooking the mechanistic root of this degradation: the layer-wise accumulation and amplification of local errors. To address this limitation, we propose LaCo(Layer-wise Compensation), a framework that reorients the recovery paradigm from global adaptation to hierarchical representation alignment.By sequentially optimizing each layer to reconstruct the model’s hidden states, LaCo effectively intercept the error propagation chain at its source.Extensive experiments demonstrate that LaCo surpasses parameter-efficient baselines in both perplexity reduction and zero-shot reasoning.Notably, it reduces recovery-time memory usage to approximately 1/7 of the baseline and requires only 2,048 unlabeled samples to match a LoRA model trained on 50k examples—achieving a ∼25× improvement in data efficiency.