Unlocking Recursive Thinking of LLMs: Alignment via Refinement

Haoke Zhang; Xiaobo Liang; Cunxiang Wang; Juntao Li; Min Zhang (张民)

doi:10.18653/v1/2025.findings-acl.582

Unlocking Recursive Thinking of LLMs: Alignment via Refinement

Haoke Zhang, Xiaobo Liang, Cunxiang Wang, Juntao Li, Min Zhang

Abstract

The OpenAI o1-series models have demonstrated that leveraging long-form Chain of Thought (CoT) can substantially enhance performance. However, the recursive thinking capabilities of Large Language Models (LLMs) remain limited, particularly in the absence of expert-curated data for distillation. In this paper, we propose AvR: Alignment via Refinement, a novel method aimed at unlocking the potential of LLMs for recursive reasoning through long-form CoT. AvR introduces a refinement process that integrates criticism and improvement actions, guided by differentiable learning techniques to optimize refinement-aware rewards. As a result, the synthesized multi-round data can be organized as a long refinement thought, further enabling test-time scaling. Experimental results show that AvR significantly outperforms conventional preference optimization methods. Notably, with only 3k synthetic samples, our method boosts the performance of the LLaMA-3-8B-Instruct model by over 20% in win rate on AlpacaEval 2.0. Our code is available at Github .

Anthology ID:: 2025.findings-acl.582
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11169–11182
Language:
URL:: https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.582/
DOI:: 10.18653/v1/2025.findings-acl.582
Bibkey:
Cite (ACL):: Haoke Zhang, Xiaobo Liang, Cunxiang Wang, Juntao Li, and Min Zhang. 2025. Unlocking Recursive Thinking of LLMs: Alignment via Refinement. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11169–11182, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Unlocking Recursive Thinking of LLMs: Alignment via Refinement (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/transition-to-people-yaml/2025.findings-acl.582.pdf

PDF Cite Search Fix data