Thesis Proposal: Efficient KV Cache Reuse for Multi-Document Retrieval-Augmented Generation

Zhipeng Zhang, Dmitry Ilvovsky


Abstract
Retrieval-Augmented Generation (RAG) systems face efficiency bottlenecks in prefill due to attention mechanism, and traditional KV cache only accelerates decoding. In this context, reusing document-level KV cache computed for retrieved documents in previous sessions during the prefill stage appears to be a natural way to amortize computation, but it raises serious correctness challenges due to position and context misalignment across queries and sessions. This research proposes a multi-document KV cache reuse framework for multi-document RAG workloads across queries and sessions to resolve position misalignment and context misalignment, preserving accuracy while eliminating document-specific quadratic complexity in prefill. Theoretical analysis will establish conditions under which multi-document KV cache reuse remains stable and close to full recomputation, providing principled guarantees for both efficiency and accuracy. These results will enable deployment in existing RAG pipelines without architectural changes or model retraining. Crucially, to ensure robustness in real-world deployments, validation will extend beyond standard benchmarks to include noise-robustness tests and domain-specific workloads (e.g., legal). The research aims to empirically confirm these guarantees and demonstrate that substantial prefill speedups can be achieved without materially degrading task-level performance.
Anthology ID:
2026.eacl-srw.11
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
160–169
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.11/
DOI:
Bibkey:
Cite (ACL):
Zhipeng Zhang and Dmitry Ilvovsky. 2026. Thesis Proposal: Efficient KV Cache Reuse for Multi-Document Retrieval-Augmented Generation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 160–169, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Thesis Proposal: Efficient KV Cache Reuse for Multi-Document Retrieval-Augmented Generation (Zhang & Ilvovsky, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.11.pdf