CRPS: Curriculum Replay via Progressive Suffixes from Successful Trajectories for Long-Horizon LLM Agents

Zijing Zhang, Xiajie Yang


Abstract
Long-horizon LLM agents trained with sparse terminal rewards tend to experience slow and unstable learning, and the issue is amplified by group-normalized on-policy objectives commonly used for LLM training (e.g., GRPO). When rollout groups collapse to nearly all failures early in training, within-group normalization yields degenerate advantages and weak learning signals. To address this, we propose Curriculum Replay via Progressive Suffixes from Successful Trajectories (CRPS), a lightweight RL-training strategy that turns serendipitous terminal successes into a within-trajectory curriculum. CRPS maintains a buffer of successful trajectories and restarts rollouts from suffix states, with an online controller adapting k to match agent competence and keep replay outcomes informative. Across ALFWorld and WebShop with different foundation models, CRPS consistently outperforms full-episode GRPO and naive experience replay. Group-level diagnostics further show that CRPS reduces degenerate groups ratio and increases within-group outcome diversity, aligning with faster and more stable training.
Anthology ID:
2026.findings-acl.680
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13891–13904
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.680/
DOI:
Bibkey:
Cite (ACL):
Zijing Zhang and Xiajie Yang. 2026. CRPS: Curriculum Replay via Progressive Suffixes from Successful Trajectories for Long-Horizon LLM Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13891–13904, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
CRPS: Curriculum Replay via Progressive Suffixes from Successful Trajectories for Long-Horizon LLM Agents (Zhang & Yang, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.680.pdf
Checklist:
 2026.findings-acl.680.checklist.pdf