When Efficiency Meets Safety: A Benchmark Security Analysis of KV Cache Compression in Large Language Models

Xiaoxiao Ma; Kuofeng Gao; Zeyi Lu; Wenxi Jiang; Hao Fang; Hao Wu; Bin Chen; Shu-Tao Xia

When Efficiency Meets Safety: A Benchmark Security Analysis of KV Cache Compression in Large Language Models

Xiaoxiao Ma, Kuofeng Gao, Zeyi Lu, Wenxi Jiang, Hao Fang, Hao Wu, Bin Chen, Shu-Tao Xia

Abstract

Key-Value (KV) caching is widely used in large language models (LLMs) to enable long-context inference efficiently, yet its security implications remain underexplored. We present the first systematic study of how KV cache compression interacts with jailbreak attacks, evaluating four model families under diverse jailbreak attacks. We identify a double-edged effect: (i) on one hand, compression can induce **Accidental Robustness**, where optimization-based and encoding-based attacks fail due to Malicious Semantic Eviction, where attacks’ own attention redirection reduces the malicious query’s cache importance, and Gradient Mismatch where discrete compression operations break jailbreak optimization. (ii) On the other hand, **Vulnerability Paradox** arises under merging-based compression for human-designed Attacks, where aggressive merging in shallow layers triggers functional head collapse, amplifying attack success rates. To address this, we propose **Safe-CAM**, a history-aware, per-head feedback merging strategy that prevents safety degradation while maintaining efficiency. Experiments show Safe-CAM fully restores safety (0% ASR) and improves benign task performance with minimal overhead. Our study highlights that KV cache compression is not only an efficiency mechanism but also a safety-critical design factor in LLM deployment.

Anthology ID:: 2026.acl-long.1123
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24472–24485
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1123/
DOI:
Bibkey:
Cite (ACL):: Xiaoxiao Ma, Kuofeng Gao, Zeyi Lu, Wenxi Jiang, Hao Fang, Hao Wu, Bin Chen, and Shu-Tao Xia. 2026. When Efficiency Meets Safety: A Benchmark Security Analysis of KV Cache Compression in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24472–24485, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: When Efficiency Meets Safety: A Benchmark Security Analysis of KV Cache Compression in Large Language Models (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1123.pdf
Checklist:: 2026.acl-long.1123.checklist.pdf

PDF Cite Search Checklist Fix data