MM-ShiftKV: Decode-Aware Prefill-Stage KV Selection for Multimodal Large Language Models
Jinsong Shu, Chenyang Wu, Zhongle Xie, Baokun Wang, Lidan Shou
Abstract
Key-Value (KV) caching is essential for efficient inference in multimodal large language models (MLLMs), yet its memory footprint grows linearly with context length and becomes a major bottleneck due to the large number of visual tokens. Recent prefill-stage KV selection methods estimate KV importance from prefilling statistics, implicitly assuming that prefilling-time queries are representative of those encountered during decoding. We show that this assumption breaks down in multimodal inference, where decoding-time queries exhibit substantially larger variance than prefilling-stage representations, leading to unstable KV importance estimation under tight cache budgets. As a result, small ranking errors can disproportionately discard semantically critical visual tokens and degrade grounding and reasoning performance. We propose MM-ShiftKV, a training-free, decode-aware and strictly prefill-only KV selection method. MM-ShiftKV approximates decoding-time query behavior during prefilling by constructing variance-expanded query proxies and estimates prompt KV importance based on their aggregated attention mass. Experiments on multimodal benchmarks demonstrate that MM-ShiftKV consistently outperforms existing methods under strict KV-cache budgets.- Anthology ID:
- 2026.findings-acl.1447
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 28964–28982
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1447/
- DOI:
- Cite (ACL):
- Jinsong Shu, Chenyang Wu, Zhongle Xie, Baokun Wang, and Lidan Shou. 2026. MM-ShiftKV: Decode-Aware Prefill-Stage KV Selection for Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28964–28982, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- MM-ShiftKV: Decode-Aware Prefill-Stage KV Selection for Multimodal Large Language Models (Shu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1447.pdf