Trust Within? Seek Beyond? Knowledge Boundary Aware Policy Optimization for Agentic Search

Tao Feng; Xinke Jiang; Xinyan Hu; Yonggang Zhang; Zhen Tao; Wentao Zhang; Boyang Liu; Wenhao Jiang; Chao Wu

Trust Within? Seek Beyond? Knowledge Boundary Aware Policy Optimization for Agentic Search

Tao Feng, Xinke Jiang, Xinyan Hu, Yonggang Zhang, Zhen Tao, Wentao Zhang, Boyang Liu, Wenhao Jiang, Chao Wu

Abstract

Agentic search augments large language models (LLMs) with external knowledge through reinforcement learning. However, existing approaches suffer from blind reliance on noisy retrieval and hallucination when both parametric and external knowledge fail—reflecting a lack of calibration regarding the model’s knowledge boundary. We propose Knowledge boundary Policy Optimization (KbPO), a reinforcement learning framework that explicitly aligns retrieval decisions with quantified knowledge states. KbPO introduces: (1) a semantic stability metric to delineate reliable parametric knowledge; (2) a four-quadrant taxonomy synthesising internal certainty with retrieval quality; and (3) a quadrant-based reward mechanism incentivising calibrated behaviour. We further adopt an iterative query evolution pipeline to construct boundary-probing training samples. Experiments on ten benchmarks demonstrate that KbPO outperforms strong baselines while exhibiting reduced hallucination rates.

Anthology ID:: 2026.acl-long.1276
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27664–27682
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1276/
DOI:
Bibkey:
Cite (ACL):: Tao Feng, Xinke Jiang, Xinyan Hu, Yonggang Zhang, Zhen Tao, Wentao Zhang, Boyang Liu, Wenhao Jiang, and Chao Wu. 2026. Trust Within? Seek Beyond? Knowledge Boundary Aware Policy Optimization for Agentic Search. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27664–27682, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Trust Within? Seek Beyond? Knowledge Boundary Aware Policy Optimization for Agentic Search (Feng et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1276.pdf
Checklist:: 2026.acl-long.1276.checklist.pdf

PDF Cite Search Checklist Fix data