s3: You Don’t Need That Much Data to Train a Search Agent via RL

Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han


Abstract
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve—entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose **s3**, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naïve RAG. **s3** requires only 2.4k training samples to outperform baselines trained on over 70 × more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.
Anthology ID:
2025.emnlp-main.1095
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21610–21628
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1095/
DOI:
Bibkey:
Cite (ACL):
Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, and Jiawei Han. 2025. s3: You Don’t Need That Much Data to Train a Search Agent via RL. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21610–21628, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
s3: You Don’t Need That Much Data to Train a Search Agent via RL (Jiang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1095.pdf
Checklist:
 2025.emnlp-main.1095.checklist.pdf