Bin Chen

Other people with similar names: Bin Chen, Bin Chen, Bin Chen, Bin Chen

Unverified author pages with similar names: Bin Chen

2026

Efficiency vs. Verifiability in Evidence-Aware RAG: Does Prompt Compression Preserve Citation Grounding?
Aiyu Li | Qian Peng | Bin Chen
Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Retrieval-augmented generation (RAG) is widely used in domain-specific and knowledge-intensive applications, where long prompts increase inference cost and may exceed context limits. Prompt compression is therefore appealing, but existing evaluations focus primarily on answer quality, overlooking whether compressed systems remain faithful to the retrieved evidence. In this paper, we ask: does compression that preserves answers also preserve grounding? Using Self-RAG and LLMLingua-2 in a controlled setting, we evaluate compressed RAG on ASQA in terms of both answer correctness and citation grounding. Under increasing compression, answer correctness drops by only 2-4%, whereas grounding drops by 40-50%. This stark divergence shows that answer-only evaluation can substantially overestimate the reliability of compressed RAG in evidence-aware scenarios. We further propose a lightweight hierarchical compression strategy that prioritizes evidence-bearing spans. It recovers nearly all grounding loss while maintaining comparable answer quality. Our results reveal a clear trade-off between efficiency and verifiability, and suggest that compression in RAG should be customized to downstream verification needs rather than treated as a one-size-fits-all efficiency intervention.

Co-authors

Aiyu Li 1
Qian Peng 1

Venues

CustomNLP4U1
WS1

Fix author