Wenhao Jiang

Other people with similar names: Wenhao Jiang

Unverified author pages with similar names: Wenhao Jiang


2026

Agentic search augments large language models (LLMs) with external knowledge through reinforcement learning. However, existing approaches suffer from blind reliance on noisy retrieval and hallucination when both parametric and external knowledge fail—reflecting a lack of calibration regarding the model’s knowledge boundary. We propose Knowledge boundary Policy Optimization (KbPO), a reinforcement learning framework that explicitly aligns retrieval decisions with quantified knowledge states. KbPO introduces: (1) a semantic stability metric to delineate reliable parametric knowledge; (2) a four-quadrant taxonomy synthesising internal certainty with retrieval quality; and (3) a quadrant-based reward mechanism incentivising calibrated behaviour. We further adopt an iterative query evolution pipeline to construct boundary-probing training samples. Experiments on ten benchmarks demonstrate that KbPO outperforms strong baselines while exhibiting reduced hallucination rates.