LeanK: Learnable K Cache Channel Pruning for Efficient Decoding

Yike Zhang; Zhiyuan He; Huiqiang Jiang; Chengruidong Zhang; Yuqing Yang; Jianyong Wang; Lili Qiu

LeanK: Learnable K Cache Channel Pruning for Efficient Decoding

Yike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang, Jianyong Wang, Lili Qiu

Abstract

Large language models (LLMs) enable long-context tasks but face efficiency challenges due to the growing key-value (KV) cache. We propose LeanK, a learning-based method that prunes unimportant key (K) cache channels by leveraging static channel sparsity. LeanK reduces GPU memory and accelerates decoding without sacrificing accuracy. Experiments demonstrate up to 70% K cache and 16%–18% V cache memory reduction, and 1.45× decoding speedup. We also provide insights into model channels and attention heads during long-context inference by analyzing the learned importance distribution. Our code is anonymously available at https://anonymous.4open.science/r/LeanK-7A87/README.md.

Anthology ID:: 2025.emnlp-main.1584
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31110–31125
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1584/
DOI:
Bibkey:
Cite (ACL):: Yike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang, Jianyong Wang, and Lili Qiu. 2025. LeanK: Learnable K Cache Channel Pruning for Efficient Decoding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31110–31125, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: LeanK: Learnable K Cache Channel Pruning for Efficient Decoding (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1584.pdf
Checklist:: 2025.emnlp-main.1584.checklist.pdf

PDF Cite Search Checklist Fix data