Huifeng Yin
2026
Nested Browser-Use Learning for Agentic Information Seeking
Baixuan Li | Jialong Wu | Wenbiao Yin | Kuan Li | Zhongwang Zhang | Huifeng Yin | Zhengwei Tao | Liwen Zhang | Pengjun Xie | Jingren Zhou | Yong Jiang | Wentao Zhang | Zhiqiang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Baixuan Li | Jialong Wu | Wenbiao Yin | Kuan Li | Zhongwang Zhang | Huifeng Yin | Zhengwei Tao | Liwen Zhang | Pengjun Xie | Jingren Zhou | Yong Jiang | Wentao Zhang | Zhiqiang Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency.
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Litu Ou | Kuan Li | Huifeng Yin | Liwen Zhang | Zhongwang Zhang | Xixi Wu | Rui Ye | Zile Qiao | Yong Jiang | Pengjun Xie | Fei Huang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work mainly focused on single-turn scenarios, while research on confidence in complex multi-turn interactions is limited. In this paper, we investigate whether LLM-based search agents have the ability to communicate their own confidence through verbalized confidence scores after long sequences of actions, a significantly more challenging task compared to outputting confidence in a single interaction. Experimenting on open-source agentic models, we first find that models exhibit much higher task accuracy at high confidence while having near-zero accuracy when confidence is low. Based on this observation, we propose Test-Time Scaling (TTS) methods that use confidence scores to determine answer quality, encourage the model to try again until reaching a satisfactory confidence level. Results show that our proposed methods significantly reduce token consumption while demonstrating competitive performance compared to baseline fixed budget TTS methods.
2025
Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
Huifeng Yin | Yu Zhao | Minghao Wu | Xuanfan Ni | Bo Zeng | Hao Wang | Tianqi Shi | Liangying Shao | Chenyang Lyu | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Huifeng Yin | Yu Zhao | Minghao Wu | Xuanfan Ni | Bo Zeng | Hao Wang | Tianqi Shi | Liangying Shao | Chenyang Lyu | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought (CoT). Distillation post-training on LRMs-generated data is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases (i.e., formalistic long-time thinking) when using Supervised Fine-tuning (SFT) and Reinforcement Learning (RL) methods. To alleviate this bottleneck, we propose constructing data from scratch using Monte Carlo Tree Search (MCTS). We then exploit a set of CoT-aware approaches, including Thoughts Length Balance, Fine-grained DPO, and Joint Post-training Objective, to enhance SFT and RL on the MCTS data. We conducted evaluation on various benchmarks such as math (GSM8K, MATH, AIME). instruction-following (Multi-IF) and planning (Blocksworld), results demonstrate our CoT-aware approaches substantially improve the reasoning performance of distilled models compared to standard distilled models via reducing the hallucinations in long-time thinking.