Inference Scaling for Bridging Retrieval and Augmented Generation

Youngwon Lee, Seung-won Hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, Yuxiong He


Abstract
Retrieval-augmented generation (RAG) has emerged as a popular approach to steering the output of a large language model (LLM) by incorporating retrieved contexts as inputs. However, existing work observed the generator bias, such that improving the retrieval results may negatively affect the outcome. In this work, we show such bias can be mitigated, from inference scaling, aggregating inference calls from the permuted order of retrieved contexts. The proposed Mixture-of-Intervention (MoI) explicitly models the debiased utility of each passage with multiple forward passes to construct a new ranking. We also show that MoI can leverage the retriever’s prior knowledge to reduce the computational cost by minimizing the number of permutations considered and lowering the cost per LLM call. We showcase the effectiveness of MoI on diverse RAG tasks, improving ROUGE-L on MS MARCO and EM on HotpotQA benchmarks by ~7 points.
Anthology ID:
2025.findings-naacl.409
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7324–7339
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.409/
DOI:
Bibkey:
Cite (ACL):
Youngwon Lee, Seung-won Hwang, Daniel F Campos, Filip Graliński, Zhewei Yao, and Yuxiong He. 2025. Inference Scaling for Bridging Retrieval and Augmented Generation. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7324–7339, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Inference Scaling for Bridging Retrieval and Augmented Generation (Lee et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.409.pdf