Yinger Zhang
2026
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.
2024
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Yinger Zhang | Hui Cai | Xierui Song | Yicheng Chen | Rui Sun | Jing Zheng
Findings of the Association for Computational Linguistics: NAACL 2024
Yinger Zhang | Hui Cai | Xierui Song | Yicheng Chen | Rui Sun | Jing Zheng
Findings of the Association for Computational Linguistics: NAACL 2024
While enabling large language models to implement function calling (known as APIs) can greatly enhance the performance of Large Language Models (LLMs), function calling is still a challenging task due to the complicated relations between different APIs, especially in a context-learning setting without fine-tuning. This paper introduces “Reverse Chain”, a controllable, target-driven approach designed to empower LLMs with the capability to operate external APIs only via prompts. Recognizing that most LLMs have limited tool-use capabilities, Reverse Chain limits LLMs to executing simple tasks, e.g., API Selection and Argument Completion. Furthermore, to manage a controllable multi-function calling, Reverse Chain adopts a generic rule-based on a backward reasoning process. This rule determines when to do API selection or Argument completion. To evaluate the multi-tool-use capability of LLMs, we have released a compositional multi-tool task dataset, available at https://github.com/zhangyingerjelly/reverse-chain. Extensive numerical experiments validate the remarkable proficiency of Reverse Chain in managing multiple API calls.