Zhuo Chen

Other people with similar names: Zhuo Chen, Zhuo Chen, Zhuo Chen, Zhuo Chen, Zhuo Chen

Unverified author pages with similar names: Zhuo Chen

2026

PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference
Hao Zhang | Lyu Mengsi | Zhuo Chen | Yulong Ao | Yonghua Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) demonstrate exceptional capabilities across various tasks, but their deployment is constrained by high computational and memory costs. Model pruning provides an effective means to alleviate these demands. However, existing methods often ignore the characteristics of prefill-decode (PD) disaggregation in practice. In this paper, we propose a pruning method that is highly integrated with PD disaggregation, enabling more precise pruning of blocks. Our approach constructs pruning and distillation sets to perform iterative block removal, obtaining better pruning solutions. Moreover, we analyze the pruning sensitivity of the prefill and decode stages and identify removable blocks specific to each stage, making it well suited for PD disaggregation deployment. Extensive experiments demonstrate our approach consistently achieves strong performance in both PD disaggregation and PD unified (non-PD disaggregation) settings, and can also be extended to other non-block pruning methods. Under the same settings, our method achieves improved performance and faster inference.

Co-authors

Venues

ACL1

Fix author