Jianghao Lin
2026
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
Jianghao Lin | Yuanyuan Shi | Xin Peng | Renjie Ding | Hairui Wang | Yuxuan Peng | Bizhe Bai | Weixi Song | Fengshuo Bai | Huacan Chai | Weinan Zhang | Fei Huang | Ying Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jianghao Lin | Yuanyuan Shi | Xin Peng | Renjie Ding | Hairui Wang | Yuxuan Peng | Bizhe Bai | Weixi Song | Fengshuo Bai | Huacan Chai | Weinan Zhang | Fei Huang | Ying Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with ToolPRM, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows “explore more but retain less”, since early JSON errors are unrecoverable.
A Survey of Large Language Model-Based Search Agents
Yunjia Xi | Jianghao Lin | Yongzhao Xiao | Zheli Zhou | Rong Shan | Te Gao | Jiachen Zhu | Weiwen Liu | Yong Yu | Weinan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yunjia Xi | Jianghao Lin | Yongzhao Xiao | Zheli Zhou | Rong Shan | Te Gao | Jiachen Zhu | Weiwen Liu | Yong Yu | Weinan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The advent of Large Language Models (LLMs) has significantly revolutionized web search. The emergence of LLM-based Search Agents marks a pivotal shift towards deeper, dynamic, autonomous information seeking. These agents can comprehend user intentions and environment context and execute multi-turn retrieval with dynamic planning, extending search capabilities far beyond the web. Leading examples like OpenAI’s Deep Research highlight their potential for deep information mining and real-world applications. This survey provides the first systematic analysis of search agents. We comprehensively analyze and categorize existing works from the perspectives of architecture, optimization, application, and evaluation, ultimately identifying critical open challenges and outlining promising future research directions in this rapidly evolving field.
Progra: Progress-Aware Reinforcement Learning for Multi-Turn Function Calling
Huacan Chai | Zijie Cao | Maolin Ran | Yingxuan Yang | Jianghao Lin | Xin Peng | Hairui Wang | Renjie Ding | Ziyu Wan | Muning Wen | Weiwen Liu | Weinan Zhang | Fei Huang | Ying Wen
Findings of the Association for Computational Linguistics: ACL 2026
Huacan Chai | Zijie Cao | Maolin Ran | Yingxuan Yang | Jianghao Lin | Xin Peng | Hairui Wang | Renjie Ding | Ziyu Wan | Muning Wen | Weiwen Liu | Weinan Zhang | Fei Huang | Ying Wen
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have achieved impressive success in single-turn function calling, yet real-world applications such as travel planning or multi-stage data analysis typically unfold across multi-turn conversations. In these settings, LLMs must not only issue accurate function calls at each step but also maintain progress awareness, the ability to summarize past interactions and plan future actions to ensure coherent, long-horizon task execution. Existing approaches, however, either reduce multi-turn training to isolated single-turn samples, which neglects task-level planning, or employ end-to-end reinforcement learning (RL) that struggles with redundancy and lacks explicit integration of progress awareness. To overcome these limitations, we introduce Progra, a framework that explicitly incorporates progress awareness into LLM training for multi-turn function calling. Progra combines (i) a Progress Awareness Generation (PAG) pipeline, which automatically constructs datasets coupling conversation summaries with future task planning, and (ii) a Progress Awareness-Guided Reinforcement Learning (PAG-RL) algorithm, which integrates progress awareness into RL training to reduce contextual redundancy and improve alignment between local actions and global task completion. Empirical results on two public benchmarks demonstrate that Progra significantly outperforms existing methods, highlighting the effectiveness of progress awareness in enabling robust and efficient multi-turn function calling. Our code is available at https://github.com/FatCatCHC/Progra .
A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage
Congmin Zheng | Jiachen Zhu | Zhuoying Ou | Yuxiang Chen | Kangning Zhang | Rong Shan | Zeyu Zheng | Mengyue Yang | Jianghao Lin | Yong Yu | Weinan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Congmin Zheng | Jiachen Zhu | Zhuoying Ou | Yuxiang Chen | Kangning Zhang | Rong Shan | Zeyu Zheng | Mengyue Yang | Jianghao Lin | Yong Yu | Weinan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have advanced reasoning ability, yet conventional alignment remains dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.
2025
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning
Jiachen Zhu | Congmin Zheng | Jianghao Lin | Kounianhua Du | Ying Wen | Yong Yu | Jun Wang | Weinan Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Jiachen Zhu | Congmin Zheng | Jianghao Lin | Kounianhua Du | Ying Wen | Yong Yu | Jun Wang | Weinan Zhang
Findings of the Association for Computational Linguistics: ACL 2025
While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies the OOD issues including step OOD, arising from differences in reasoning patterns across model types and sizes, and question OOD, due to dataset shifts between training and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps for PRM as a warmup to stimulate its potential to judge target steps, improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetreivalPRM model, establishing a new standard for PRM performance.
Search
Fix author
Co-authors
- Weinan Zhang 5
- Ying Wen 3
- Yong Yu 3
- Jiachen Zhu 3
- Huacan Chai 2
- Renjie Ding 2
- Fei Huang 2
- Weiwen Liu 2
- Xin Peng 2
- Rong Shan 2
- Hairui Wang 2
- Congmin Zheng 2
- Bizhe Bai 1
- Fengshuo Bai 1
- Zijie Cao 1
- Yuxiang Chen 1
- Kounianhua Du 1
- Te Gao 1
- Zhuoying Ou 1
- Yuxuan Peng 1
- Maolin Ran 1
- Yuanyuan Shi 1
- Weixi Song 1
- Ziyu Wan 1
- Jun Wang 1
- Muning Wen 1
- Yunjia Xi 1
- Yongzhao Xiao 1
- Yingxuan Yang 1
- Mengyue Yang 1
- Kangning Zhang 1
- Zeyu Zheng 1
- Zheli Zhou 1