LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning

Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo


Abstract
Large Language Models (LLMs) exhibit enhanced capabilities by Chain-of-Thought reasoning. However, the extended reasoning sequences introduce significant GPU memory overhead due to increased key-value (KV) cache. Existing KV cache compression methods mitigate memory bottlenecks but struggle in long reasoning tasks. In this paper, we analyze attention patterns in reasoning tasks and reveal a **Token Importance Recurrence** phenomenon: a large proportion of tokens regain high attention after multiple decoding steps, which is failed to capture by existing works and may lead to unpredictable eviction on such periodically critical tokens. To address this, we propose **LazyEviction**, an observation window-based lagged eviction framework retaining latent recurring tokens by prioritized eviction based on tokens’ recurrence patterns. Extensive experiments demonstrate that LazyEviction reduces KV cache by 50% 70% while maintaining comparable accuracy, outperforming existing KV cache baselines. Our implementation code can be found at https://github.com/Halo-949/LazyEviction.
Anthology ID:
2026.acl-long.1683
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36335–36352
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1683/
DOI:
Bibkey:
Cite (ACL):
Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, and Song Guo. 2026. LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 36335–36352, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning (Zhang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1683.pdf
Checklist:
 2026.acl-long.1683.checklist.pdf