PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization

Zouying Cao; Runze Wang; Yifei Yang; Xinbei Ma; Xiaoyong Zhu; Bo Zheng; Hai Zhao

PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization

Zouying Cao, Runze Wang, Yifei Yang, Xinbei Ma, Xiaoyong Zhu, Bo Zheng, Hai Zhao

Abstract

Large Language Model (LLM) agents have demonstrated impressive capabilities in handling complex interactive problems. Existing LLM agents mainly generate natural language plans to guide reasoning, which is verbose and inefficient. NL plans are also tailored to specific tasks and restrict agents’ ability to generalize across similar tasks. To this end, we explore pseudocode-style plans (P-code Plan) to capture the structural logic of reasoning. We find that P-code Plan empowers LLM agents with stronger generalization ability and more efficiency. Inspired by this finding, we propose a pseudocode-style ̲Planning ̲Guided ̲Preference ̲Optimization method called PGPO for effective agent learning. With two planning-oriented rewards, PGPO further enhances LLM agents’ ability to generate high-quality P-code Plans and subsequent reasoning. Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines. Analyses reveal the advantage of PGPO in reducing action errors and omissions during reasoning.

Anthology ID:: 2025.findings-acl.774
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14966–14985
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.774/
DOI:
Bibkey:
Cite (ACL):: Zouying Cao, Runze Wang, Yifei Yang, Xinbei Ma, Xiaoyong Zhu, Bo Zheng, and Hai Zhao. 2025. PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14966–14985, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization (Cao et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.774.pdf

PDF Cite Search Fix data