Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation

Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen


Abstract
Retrieval-augmented generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieved from external sources. However, it often struggles to cope with inconsistent and irrelevant information that can distract the LM from its tasks, especially when multiple evidence pieces are required. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream tasks, potentially failing to utilize the evidence effectively. We propose FaviComp (Familarity-Aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Experimental results show that FaviComp consistently outperforms the most recent evidence compression baselines across multiple open-domain QA datasets, improving accuracy by up to 28.1% while achieving high compression rates. Additionally, we demonstrate the effective integration of both parametric and non-parametric knowledge during evidence compression.
Anthology ID:
2025.findings-emnlp.878
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16181–16196
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.878/
DOI:
10.18653/v1/2025.findings-emnlp.878
Bibkey:
Cite (ACL):
Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, and Muhao Chen. 2025. Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16181–16196, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation (Jung et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.878.pdf
Checklist:
 2025.findings-emnlp.878.checklist.pdf