Xin Peng
2026
ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling
Jianghao Lin | Yuanyuan Shi | Xin Peng | Renjie Ding | Hairui Wang | Yuxuan Peng | Bizhe Bai | Weixi Song | Fengshuo Bai | Huacan Chai | Weinan Zhang | Fei Huang | Ying Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jianghao Lin | Yuanyuan Shi | Xin Peng | Renjie Ding | Hairui Wang | Yuxuan Peng | Bizhe Bai | Weixi Song | Fengshuo Bai | Huacan Chai | Weinan Zhang | Fei Huang | Ying Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) excel at function calling, but inference scaling has been explored mainly for unstructured generation. We propose an inference-scaling framework for structured outputs that combines fine-grained beam search with ToolPRM, a process reward model scoring each intra-call decision (function name and argument filling). We build the first fine-grained intra-call supervision dataset via function masking, rollout collection, and step-level annotation. ToolPRM outperforms outcome and coarse-grained reward models in predictive accuracy and yields consistent test-time gains on multiple function-calling benchmarks. We further show that structured generation follows “explore more but retain less”, since early JSON errors are unrecoverable.
Progra: Progress-Aware Reinforcement Learning for Multi-Turn Function Calling
Huacan Chai | Zijie Cao | Maolin Ran | Yingxuan Yang | Jianghao Lin | Xin Peng | Hairui Wang | Renjie Ding | Ziyu Wan | Muning Wen | Weiwen Liu | Weinan Zhang | Fei Huang | Ying Wen
Findings of the Association for Computational Linguistics: ACL 2026
Huacan Chai | Zijie Cao | Maolin Ran | Yingxuan Yang | Jianghao Lin | Xin Peng | Hairui Wang | Renjie Ding | Ziyu Wan | Muning Wen | Weiwen Liu | Weinan Zhang | Fei Huang | Ying Wen
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have achieved impressive success in single-turn function calling, yet real-world applications such as travel planning or multi-stage data analysis typically unfold across multi-turn conversations. In these settings, LLMs must not only issue accurate function calls at each step but also maintain progress awareness, the ability to summarize past interactions and plan future actions to ensure coherent, long-horizon task execution. Existing approaches, however, either reduce multi-turn training to isolated single-turn samples, which neglects task-level planning, or employ end-to-end reinforcement learning (RL) that struggles with redundancy and lacks explicit integration of progress awareness. To overcome these limitations, we introduce Progra, a framework that explicitly incorporates progress awareness into LLM training for multi-turn function calling. Progra combines (i) a Progress Awareness Generation (PAG) pipeline, which automatically constructs datasets coupling conversation summaries with future task planning, and (ii) a Progress Awareness-Guided Reinforcement Learning (PAG-RL) algorithm, which integrates progress awareness into RL training to reduce contextual redundancy and improve alignment between local actions and global task completion. Empirical results on two public benchmarks demonstrate that Progra significantly outperforms existing methods, highlighting the effectiveness of progress awareness in enabling robust and efficient multi-turn function calling. Our code is available at https://github.com/FatCatCHC/Progra .
Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults
Zhenhao Zhou | Zhuochen Huang | Yike He | Chong Wang | Jiajun Wang | Yijian Wu | Xin Peng | Yiling Lou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhenhao Zhou | Zhuochen Huang | Yike He | Chong Wang | Jiajun Wang | Yijian Wu | Xin Peng | Yiling Lou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The Linux kernel is a critical system, serving as the foundation for numerous systems. Bugs in the Linux kernel can cause serious consequences, affecting billions of users. Fault localization (FL), which aims at identifying the buggy code elements in software, plays an essential role in software quality assurance. While recent LLM agents have achieved promising accuracy in FL on recent benchmarks like SWE-bench, it remains unclear how well these methods perform in the Linux kernel, where FL is much more challenging due to the large-scale code base, limited observability, and diverse impact factors. In this paper, we introduce LinuxFLBench, a FL benchmark constructed from real-world Linux kernel bugs. We conduct an empirical study to assess the performance of state-of-the-art LLM agents on the Linux kernel. Our initial results reveal that existing agents struggle with this task, achieving a best top-1 accuracy of only 41.6% at file level. To address this challenge, we propose LinuxFL+, an enhancement framework designed to improve FL effectiveness of LLM agents for the Linux kernel. LinuxFL+ substantially improves the FL accuracy of all studied agents (e.g., 7.2% - 11.2% accuracy increase) with minimal costs.
2024
ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform
Song He | Xin Peng | Yihan Cai | Xin Li | Zhiqing Yuan | WenLi Du | Weimin Yang
Findings of the Association for Computational Linguistics: NAACL 2024
Song He | Xin Peng | Yihan Cai | Xin Li | Zhiqing Yuan | WenLi Du | Weimin Yang
Findings of the Association for Computational Linguistics: NAACL 2024
Automated synthesis of zeolite, one of the most important catalysts in chemical industries, holds great significance for attaining economic and environmental benefits. Structural synthesis data extracted through NLP technologies from zeolite experimental procedures can significantly expedite automated synthesis owing to its machine readability. However, the utilization of NLP technologies in information extraction of zeolite synthesis remains restricted due to the lack of annotated datasets. In this paper, we formulate an event extraction task to mine structural synthesis actions from experimental narratives for modular automated synthesis. Furthermore, we introduce ZSEE, a novel dataset containing fine-grained event annotations of zeolite synthesis actions. Our dataset features 16 event types and 13 argument roles which cover all the experimental operational steps of zeolite synthesis. We explore current state-of-the-art event extraction methods on ZSEE, perform error analysis based on the experimental results, and summarize the challenges and corresponding research directions to further facilitate the automated synthesis of zeolites. The code is publicly available at https://github.com/Hi-0317/ZSEE.
Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation
Tong Su | Xin Peng | Sarubi Thillainathan | David Guzmán | Surangika Ranathunga | En-Shiun Lee
Findings of the Association for Computational Linguistics: NAACL 2024
Tong Su | Xin Peng | Sarubi Thillainathan | David Guzmán | Surangika Ranathunga | En-Shiun Lee
Findings of the Association for Computational Linguistics: NAACL 2024
Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies significantly across different languages. We conducted comprehensive empirical experiments with varying LRL domains and sizes to evaluate the performance of 8 PEFT methods with in total of 15 architectures using the SacreBLEU score. We showed that 6 PEFT architectures outperform the baseline for both in-domain and out-domain tests and the Houlsby+Inversion adapter has the best performance overall, proving the effectiveness of PEFT methods.
Search
Fix author
Co-authors
- Huacan Chai 2
- Renjie Ding 2
- Fei Huang 2
- Jianghao Lin 2
- Hairui Wang 2
- Ying Wen 2
- Weinan Zhang 2
- Bizhe Bai 1
- Fengshuo Bai 1
- Yihan Cai 1
- Zijie Cao 1
- WenLi Du 1
- David Guzmán 1
- Song He 1
- Yike He 1
- Zhuochen Huang 1
- En-Shiun Lee 1
- Xin Li 1
- Weiwen Liu 1
- Yiling Lou 1
- Yuxuan Peng 1
- Maolin Ran 1
- Surangika Ranathunga 1
- Yuanyuan Shi 1
- Weixi Song 1
- Tong Su 1
- Sarubi Thillainathan 1
- Ziyu Wan 1
- Chong Wang 1
- Jiajun Wang 1
- Muning Wen 1
- Yijian Wu 1
- Weimin Yang 1
- Yingxuan Yang 1
- Zhiqing Yuan 1
- Zhenhao Zhou 1