Zhipeng Zhang

2026

Thesis Proposal: Efficient KV Cache Reuse for Multi-Document Retrieval-Augmented Generation
Zhipeng Zhang | Dmitry Ilvovsky
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Retrieval-Augmented Generation (RAG) systems face efficiency bottlenecks in prefill due to attention mechanism, and traditional KV cache only accelerates decoding. In this context, reusing document-level KV cache computed for retrieved documents in previous sessions during the prefill stage appears to be a natural way to amortize computation, but it raises serious correctness challenges due to position and context misalignment across queries and sessions. This research proposes a multi-document KV cache reuse framework for multi-document RAG workloads across queries and sessions to resolve position misalignment and context misalignment, preserving accuracy while eliminating document-specific quadratic complexity in prefill. Theoretical analysis will establish conditions under which multi-document KV cache reuse remains stable and close to full recomputation, providing principled guarantees for both efficiency and accuracy. These results will enable deployment in existing RAG pipelines without architectural changes or model retraining. Crucially, to ensure robustness in real-world deployments, validation will extend beyond standard benchmarks to include noise-robustness tests and domain-specific workloads (e.g., legal). The research aims to empirically confirm these guarantees and demonstrate that substantial prefill speedups can be achieved without materially degrading task-level performance.

pdf bib abs

Writing effective rebuttals is a high-stakes task that demands more than linguistic fluency, as it requires precise alignment between reviewer intent and manuscript details. Current solutions typically treat this as a direct-to-text generation problem, suffering from hallucination, overlooked critiques, and a lack of verifiable grounding. To address these limitations, we introduce RebuttalAgent, the first multi-agents framework that reframes rebuttal generation as an evidence-centric planning task. Our system decomposes complex feedback into atomic concerns and dynamically constructs hybrid contexts by synthesizing compressed summaries with high-fidelity text while integrating an autonomous and on-demand external search module to resolve concerns requiring outside literature. By generating an inspectable response plan before drafting, RebuttalAgent ensures that every argument is explicitly anchored in internal or external evidence. We validate our approach on the proposed RebuttalBench and demonstrate that our pipeline outperforms strong baselines in coverage, faithfulness, and strategic coherence, offering a transparent and controllable assistant for the peer review process. Code will be released.

Co-authors

Jipeng Xiao 1

Yuanhao Yue 1

Venues

ACL1
EACL1

Fix author