Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection

Linxuan Du; Guangquan Xue; Xiaobo Liang; Qipeng Huang; Yuyang Ding; Xinyu Shi; Zhang Yijun; Ji Qi; Wenpeng Zhu; Juntao Li; Min Zhang

Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection

Linxuan Du, Guangquan Xue, Xiaobo Liang, Qipeng Huang, Yuyang Ding, Xinyu Shi, Zhang Yijun, Ji Qi, Wenpeng Zhu, Juntao Li, Min Zhang

Abstract

Despite the potential of multi-turn self-reflection to improve LLM reasoning, its effectiveness in practice is severely constrained by a failure mode we term the Echo Trap.Specifically, this phenomenon gives rise to two coupled problems: (1) the model becomes limited by its inherent capabilities and tends to repeat earlier reflections to preserve reward signals; (2) once such “copy” behavior is reinforced, the model ceases to try new strategies, leading to exploration collapse.We attribute this issue to imprecise credit assignment during training, as standard GRPO assigns rewards at the trajectory level, making it difficult to distinguish which reflection steps contribute to improved outcomes.To address this limitation, we propose a tree-structured extension of GRPO for multi-turn self-reflection, which enables more accurate advantage estimation.Through extensive experiments, we analyze the Echo Trap and demonstrate that our method effectively mitigates behavior collapse and improves performance across multiple benchmarks.

Anthology ID:: 2026.acl-long.1636
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35393–35405
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1636/
DOI:
Bibkey:
Cite (ACL):: Linxuan Du, Guangquan Xue, Xiaobo Liang, Qipeng Huang, Yuyang Ding, Xinyu Shi, Zhang Yijun, Ji Qi, Wenpeng Zhu, Juntao Li, and Min Zhang. 2026. Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 35393–35405, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Escaping the Echo Trap: On Credit Assignment Failure in Multi-turn LLM Self-Reflection (Du et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1636.pdf
Checklist:: 2026.acl-long.1636.checklist.pdf

PDF Cite Search Checklist Fix data