StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

Zhirui Chen, Peiyang Liu, Ling Shao


Abstract
As Large Language Models (LLMs) scale to support context windows exceeding one million tokens, the linear growth of Key-Value (KV) cache imposes severe memory capacity and bandwidth bottlenecks, constraining the efficiency of long-context inference. Existing compression approaches typically prioritize tokens based on local saliency metrics to decouple prefill computation from decoding memory. However, these methods often rely on local saliency snapshots at a specific layer, thereby systematically discarding tokens that act as global information hubs across the network depth but appear temporarily dormant at the specific layer selected for pruning. To address this limitation, we propose StructKV, a structure-aware KV cache compression framework that introduces three core innovations: First, Global In-Degree Centrality aggregates attention patterns across the network depth to identify global information hubs. Second, Dynamic Pivot Detection utilizes information-theoretic metrics to adaptively locate the optimal layer for compression. Finally, Structural Propagation Decoupling separates the computational budget from the memory storage budget. Experimental results on the LongBench and RULER benchmarks demonstrate that StructKV effectively preserves long-range dependencies and retrieval robustness.
Anthology ID:
2026.findings-acl.621
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12784–12797
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.621/
DOI:
Bibkey:
Cite (ACL):
Zhirui Chen, Peiyang Liu, and Ling Shao. 2026. StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12784–12797, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference (Chen et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.621.pdf
Checklist:
 2026.findings-acl.621.checklist.pdf