Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

Qifan Yu, Zhenyu He, Sijie Li, Zhou Xun, Jun Zhang, Jingjing Xu, Di He


Abstract
Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model’s reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers, a standard Transformer with cross-block parameter-sharing architecture, possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose **RELAY** (**RE**asoning through **L**oop **A**lignment iterativel**Y**). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer’s ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model.
Anthology ID:
2026.eacl-long.97
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2206–2222
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.97/
DOI:
Bibkey:
Cite (ACL):
Qifan Yu, Zhenyu He, Sijie Li, Zhou Xun, Jun Zhang, Jingjing Xu, and Di He. 2026. Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2206–2222, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning (Yu et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.97.pdf