Zhongwang Zhang
2026
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.
Nested Browser-Use Learning for Agentic Information Seeking
Baixuan Li | Jialong Wu | Wenbiao Yin | Kuan Li | Zhongwang Zhang | Huifeng Yin | Zhengwei Tao | Liwen Zhang | Pengjun Xie | Jingren Zhou | Yong Jiang | Wentao Zhang | Zhiqiang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Baixuan Li | Jialong Wu | Wenbiao Yin | Kuan Li | Zhongwang Zhang | Huifeng Yin | Zhengwei Tao | Liwen Zhang | Pengjun Xie | Jingren Zhou | Yong Jiang | Wentao Zhang | Zhiqiang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency.
2025
Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism
Zhiwei Wang | Yunji Wang | Zhongwang Zhang | Zhangchen Zhou | Hui Jin | Tianyang Hu | Jiacheng Sun | Zhenguo Li | Yaoyu Zhang | Zhi-Qin John Xu
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhiwei Wang | Yunji Wang | Zhongwang Zhang | Zhangchen Zhou | Hui Jin | Tianyang Hu | Jiacheng Sun | Zhenguo Li | Yaoyu Zhang | Zhi-Qin John Xu
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capability. In this study, we constructed a symbolic multi-step reasoning task to investigate the information propagation mechanisms in Transformer models when solving the task through direct answering and Chain-of-Thought (CoT) reasoning. We introduced the concept of buffer mechanism: the model stores various information in distinct buffers and selectively extracts it through the query-key matrix. We proposed a random matrix-based algorithm to enhance the model’s reasoning ability. This algorithm introduces only 132 trainable parameters, yet leads to significant performance improvements on 7 multi-step reasoning datasets, including PrOntoQA, LogicAsker, and LogicInference. These findings provide new insights into understanding the large language models.