Wenhao Jiang
Other people with similar names: Wenhao Jiang
Unverified author pages with similar names: Wenhao Jiang
2026
Trust Within? Seek Beyond? Knowledge Boundary Aware Policy Optimization for Agentic Search
Tao Feng | Xinke Jiang | Xinyan Hu | Yonggang Zhang | Zhen Tao | Wentao Zhang | Boyang Liu | Wenhao Jiang | Chao Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tao Feng | Xinke Jiang | Xinyan Hu | Yonggang Zhang | Zhen Tao | Wentao Zhang | Boyang Liu | Wenhao Jiang | Chao Wu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Agentic search augments large language models (LLMs) with external knowledge through reinforcement learning. However, existing approaches suffer from blind reliance on noisy retrieval and hallucination when both parametric and external knowledge fail—reflecting a lack of calibration regarding the model’s knowledge boundary. We propose Knowledge boundary Policy Optimization (KbPO), a reinforcement learning framework that explicitly aligns retrieval decisions with quantified knowledge states. KbPO introduces: (1) a semantic stability metric to delineate reliable parametric knowledge; (2) a four-quadrant taxonomy synthesising internal certainty with retrieval quality; and (3) a quadrant-based reward mechanism incentivising calibrated behaviour. We further adopt an iterative query evolution pipeline to construct boundary-probing training samples. Experiments on ten benchmarks demonstrate that KbPO outperforms strong baselines while exhibiting reduced hallucination rates.