Xudong Guo
2026
MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
Ruoran Li | Xinghua Zhang | Haiyang Yu | Shitong Duan | Xiang Li | Wenxin Xiang | Chonghua Liao | Xudong Guo | Yongbin Li | Jinli Suo
Findings of the Association for Computational Linguistics: ACL 2026
Ruoran Li | Xinghua Zhang | Haiyang Yu | Shitong Duan | Xiang Li | Wenxin Xiang | Chonghua Liao | Xudong Guo | Yongbin Li | Jinli Suo
Findings of the Association for Computational Linguistics: ACL 2026
Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically introduce the external memory module and look up the relevant information from the stored memory, which prevents the model itself from proactively managing its memory content and aligning with the agent’s overarching task objectives. To address these limitations, we propose the self-memory policy optimization algorithm (MemPO), which enables the agent (policy model) to autonomously summarize and manage their memory during interaction with environment. By improving the credit assignment mechanism based on memory effectiveness, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments and analyses confirm that MemPO achieves absolute F1 score gains of 25.98 over the base model and 7.1 over the previous SOTA baseline, while reducing token usage by 67.58% and 73.12%.
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yinger Zhang | Shutong Jiang | Renhao Li | Jianhong Tu | Yang Su | Lianghao Deng | Xudong Guo | ChenXu Lv | Junyang Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. Meanwhile, existing LLM planning benchmarks underrepresent the active information gathering and fine-grained local constraints typical of real-world settings. To address this, we introduce DeepPlanning, a challenging benchmark for practical long-horizon agent planning. It features multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. Evaluations on DeepPlanning show that even frontier agentic LLMs struggle with these problems, highlighting the importance of reliable explicit reasoning patterns and parallel tool use for achieving better effectiveness-efficiency trade-offs. Error analysis further points to promising directions for improving agentic LLMs over long planning horizons. We open-source the code and data to support future research.
2025
LIST: Linearly Incremental SQL Translator for Single-Hop Reasoning, Generation and Verification
Kaiyuan Guan | Ruoxin Li | Xudong Guo | Zhenning Huang | Xudong Weng | Hehuan Liu | Zheng Wei | Zang Li
Findings of the Association for Computational Linguistics: ACL 2025
Kaiyuan Guan | Ruoxin Li | Xudong Guo | Zhenning Huang | Xudong Weng | Hehuan Liu | Zheng Wei | Zang Li
Findings of the Association for Computational Linguistics: ACL 2025
SQL languages often feature nested structures that require robust interaction with databases. Aside from the well-validated schema linking methods on PLMs and LLMs, we introduce the Linearly Incremental SQL Translator (LIST), a novel algorithmic toolkit designed to leverage the notable reasoning and tool interaction capabilities inherent in LLMs. LIST transforms complex SQL queries into grammatically verifiable sub-queries which are arranged sequentially to reflect single-hop reasoning steps, enhancing both the granularity and accuracy of database interactions. With in-context learning, our experiments demonstrated significant improvements, achieving notable performance of 60.56% and 56.32% on the BIRD dataset with GPT-4o and Llama-3-70B-Instruct. To the best of our knowledge, this achieves SOTA performance among non-schema linking methods, also surpassing a series of schema linking based approaches at a comparable or better cost.