RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation

Shi-Qi Yan; Quan Liu; Zhen-Hua Ling

RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation

Abstract

While Retrieval-Augmented Generation (RAG) has exhibited promise in utilizing external knowledge, its generation process heavily depends on the quality and accuracy of the retrieved context. Large language models (LLMs) struggle to evaluate the correctness of non-parametric knowledge retrieved externally when it differs from internal memorization, leading to knowledge conflicts during response generation. To this end, we introduce the **R**etrieval **P**reference **O**ptimization (RPO), a lightweight and effective alignment method to adaptively leverage multi-source knowledge based on retrieval relevance. An implicit representation of retrieval relevance is derived and incorporated into the reward model to integrate retrieval evaluation and response generation into a single model, solving the problem that previous methods necessitate the additional procedure to assess the retrieval quality. Notably, RPO is a RAG-dedicated alignment approach that quantifies the awareness of retrieval relevance in training, first overcoming mathematical obstacles. Experiments on four datasets demonstrate that RPO outperforms RAG by 4-10% in accuracy without any extra component, exhibiting its robust generalization.

Anthology ID:: 2025.acl-long.261
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5228–5240
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.261/
DOI:
Bibkey:
Cite (ACL):: Shi-Qi Yan, Quan Liu, and Zhen-Hua Ling. 2025. RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5228–5240, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: RPO: Retrieval Preference Optimization for Robust Retrieval-Augmented Generation (Yan et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.261.pdf

PDF Cite Search Fix data