Enhancing RAG Efficiency with Adaptive Context Compression

Shuyu Guo, Shuo Zhang, Zhaochun Ren


Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but incurs significant inference costs due to lengthy retrieved contexts. While context compression mitigates this issue, existing methods apply fixed compression rates—over-compressing simple queries or under-compressing complex ones. We propose Adaptive Context Compression for RAG (ACC-RAG), a framework that dynamically adjusts compression rates based on input complexity, optimizing inference efficiency without loss of accuracy. ACC-RAG combines a hierarchical compressor (for multi-granular embeddings) with a context selector to retain minimal sufficient information, akin to human skimming. Evaluated on Wikipedia and five QA datasets, ACC-RAG outperforms fixed-rate methods and unlocks >4× faster inference versus standard RAG while maintaining or improving accuracy.
Anthology ID:
2025.findings-emnlp.1307
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24061–24076
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1307/
DOI:
10.18653/v1/2025.findings-emnlp.1307
Bibkey:
Cite (ACL):
Shuyu Guo, Shuo Zhang, and Zhaochun Ren. 2025. Enhancing RAG Efficiency with Adaptive Context Compression. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24061–24076, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Enhancing RAG Efficiency with Adaptive Context Compression (Guo et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1307.pdf
Checklist:
 2025.findings-emnlp.1307.checklist.pdf