RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning

Kun Li; Yunxiang Li; Tianhua Zhang (张兴华); Hongyin Luo; Xixin Wu; James Glass; Helen Meng

RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning

Kun Li, Yunxiang Li, Tianhua Zhang, Hongyin Luo, Xixin Wu, James R. Glass, Helen M. Meng

Abstract

Robust evaluation is critical for deploying trustworthy retrieval-augmented generation (RAG) systems. However, current LLM-based evaluation frameworks predominantly rely on directly prompting resource-intensive models with complex multi-stage prompts, underutilizing models’ reasoning capabilities and introducing significant computational cost. In this paper, we present RAG-Zeval (RAG-Zero Evaluator), a novel end-to-end framework that formulates faithfulness and correctness evaluation of RAG systems as a rule-guided reasoning task. Our approach trains evaluators with reinforcement learning, facilitating compact models to generate comprehensive and sound assessments with detailed explanation in one-pass. We introduce a ranking-based outcome reward mechanism, using preference judgments rather than absolute scores, to address the challenge of obtaining precise pointwise reward signals. To this end, we synthesize the ranking references by generating quality-controlled responses with zero human annotation. Experiments demonstrate RAG-Zeval’s superior performance, achieving the strongest correlation with human judgments and outperforming baselines that rely on LLMs with 10-100× more parameters. Our approach also exhibits superior interpretability in response evaluation.

Anthology ID:: 2025.emnlp-main.1267
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24936–24954
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1267/
DOI:
Bibkey:
Cite (ACL):: Kun Li, Yunxiang Li, Tianhua Zhang, Hongyin Luo, Xixin Wu, James R. Glass, and Helen M. Meng. 2025. RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24936–24954, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning (Li et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1267.pdf
Checklist:: 2025.emnlp-main.1267.checklist.pdf

PDF Cite Search Checklist Fix data