Q-PRM: Adaptive Query Rewriting for Retrieval-Augmented Generation via Step-level Process Supervision

Xiaopeng Ye; Chen Xu (许晨, 徐晨); Chaoliang Zhang; Zhaocheng Du; Jun Xu; Gang Wang; Zhenhua Dong

doi:10.18653/v1/2025.findings-emnlp.817

Q-PRM: Adaptive Query Rewriting for Retrieval-Augmented Generation via Step-level Process Supervision

Xiaopeng Ye, Chen Xu, Chaoliang Zhang, Zhaocheng Du, Jun Xu, Gang Wang, Zhenhua Dong

Abstract

Query rewriting plays a pivotal role in Retrieval-Augmented Generation (RAG) by refining real-world queries of varying complexity. Existing approaches typically rely on outcome-supervised training or heuristic rules to guide the rewriting process. However, these paradigms often struggle to handle queries with varying levels of complexity, posing over- and under-refinement problems. We identify the root cause of these issues as the absence of supervision signals for intermediate steps. To fully construct and utilize such signals, we propose Q-PRM, a novel query rewriting framework. Q-PRM reformulates the rewriting process as a Markov Decision Process (MDP) composed of atomic rewriting steps. In this way, Q-PRM can apply process-level supervision to each atomic step according to the query type, offering more targeted and effective guidance. Q-PRM comprises three key stages: (1) applying Monte Carlo Tree Search to generate step-level process supervision signals; (2) performing reinforced self-training for progressive process refinement; and (3) employing PRM-guided decoding during inference. Experiments on several open-domain QA benchmarks demonstrate that Q-PRM consistently outperforms baselines across different levels of query complexity.

Anthology ID:: 2025.findings-emnlp.817
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15113–15128
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.817/
DOI:: 10.18653/v1/2025.findings-emnlp.817
Bibkey:
Cite (ACL):: Xiaopeng Ye, Chen Xu, Chaoliang Zhang, Zhaocheng Du, Jun Xu, Gang Wang, and Zhenhua Dong. 2025. Q-PRM: Adaptive Query Rewriting for Retrieval-Augmented Generation via Step-level Process Supervision. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15113–15128, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Q-PRM: Adaptive Query Rewriting for Retrieval-Augmented Generation via Step-level Process Supervision (Ye et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.817.pdf
Checklist:: 2025.findings-emnlp.817.checklist.pdf

PDF Cite Search Checklist Fix data