Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection
Linxuan Du, Guangquan Xue, Xiaobo Liang, Qipeng Huang, Yuyang Ding, Xinyu Shi, Zhang Yijun, Ji Qi, Wenpeng Zhu, Juntao Li, Min Zhang
Abstract
Despite the potential of multi-turn self-reflection to improve LLM reasoning, its effectiveness in practice is severely constrained by a failure mode we term the Echo Trap.Specifically, this phenomenon gives rise to two coupled problems: (1) the model becomes limited by its inherent capabilities and tends to repeat earlier reflections to preserve reward signals; (2) once such “copy” behavior is reinforced, the model ceases to try new strategies, leading to exploration collapse.We attribute this issue to imprecise credit assignment during training, as standard GRPO assigns rewards at the trajectory level, making it difficult to distinguish which reflection steps contribute to improved outcomes.To address this limitation, we propose a tree-structured extension of GRPO for multi-turn self-reflection, which enables more accurate advantage estimation.Through extensive experiments, we analyze the Echo Trap and demonstrate that our method effectively mitigates behavior collapse and improves performance across multiple benchmarks.- Anthology ID:
- 2026.acl-long.1636
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35393–35405
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1636/
- DOI:
- Cite (ACL):
- Linxuan Du, Guangquan Xue, Xiaobo Liang, Qipeng Huang, Yuyang Ding, Xinyu Shi, Zhang Yijun, Ji Qi, Wenpeng Zhu, Juntao Li, and Min Zhang. 2026. Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 35393–35405, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection (Du et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1636.pdf