Xudong Guo
2026
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.
2025
LIST: Linearly Incremental SQL Translator for Single-Hop Reasoning, Generation and Verification
Kaiyuan Guan | Ruoxin Li | Xudong Guo | Zhenning Huang | Xudong Weng | Hehuan Liu | Zheng Wei | Zang Li
Findings of the Association for Computational Linguistics: ACL 2025
Kaiyuan Guan | Ruoxin Li | Xudong Guo | Zhenning Huang | Xudong Weng | Hehuan Liu | Zheng Wei | Zang Li
Findings of the Association for Computational Linguistics: ACL 2025
SQL languages often feature nested structures that require robust interaction with databases. Aside from the well-validated schema linking methods on PLMs and LLMs, we introduce the Linearly Incremental SQL Translator (LIST), a novel algorithmic toolkit designed to leverage the notable reasoning and tool interaction capabilities inherent in LLMs. LIST transforms complex SQL queries into grammatically verifiable sub-queries which are arranged sequentially to reflect single-hop reasoning steps, enhancing both the granularity and accuracy of database interactions. With in-context learning, our experiments demonstrated significant improvements, achieving notable performance of 60.56% and 56.32% on the BIRD dataset with GPT-4o and Llama-3-70B-Instruct. To the best of our knowledge, this achieves SOTA performance among non-schema linking methods, also surpassing a series of schema linking based approaches at a comparable or better cost.