R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Ma Bowen, Zhichun Wang, Shijie Jia, Limin Liu, Xin Wang
Abstract
Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning–search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning–Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning–search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to search or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-type rewards to jointly optimize the reasoning–search trajectory. Experiments on seven datasets show that R-Search significantly outperforms mainstream RAG baselines.- Anthology ID:
- 2026.findings-acl.1896
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 38030–38046
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1896/
- DOI:
- Cite (ACL):
- Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Ma Bowen, Zhichun Wang, Shijie Jia, Limin Liu, and Xin Wang. 2026. R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 38030–38046, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning (Zhao et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1896.pdf