Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition

Chenxin An; Zhihui Xie; Xiaonan Li; Ming Zhong; Shansan Gong; Lei Li; Jun Zhang; Jingjing Xu; Lingpeng Kong

Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition

Chenxin An, Zhihui Xie, Xiaonan Li, Ming Zhong, Shansan Gong, Lei Li, Jun Zhang, Jingjing Xu, Lingpeng Kong

Abstract

Reasoning models have demonstrated remarkable performance on complex tasks by generating long reasoning traces prior to producing final answers. However, previous research on long-context scaling in language models has generally focused on managing lengthy input prompts instead of producing long outputs. To leverage the strong long context understanding abilities of current models, we introduce Understanding-to-Reasoning Transition (URT) fine-tuning, a sequence-level curriculum learning framework that gradually shifts a model’s focus from interpreting long chain-of-thoughts to generating them. By incorporating partial reasoning steps in the input context, URT naturally exposes the model to diverse prompt lengths during training, preserving its performance on long-context comprehension while developing advanced reasoning capabilities. Experiments on rigorous reasoning benchmarks, including AIME24 and GPQA Diamond, reveal that our approach surpasses standard fine-tuning by over 10%, while maintaining robust performance on the understanding tasks in RULER.

Anthology ID:: 2025.emnlp-main.1751
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34506–34522
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1751/
DOI:
Bibkey:
Cite (ACL):: Chenxin An, Zhihui Xie, Xiaonan Li, Ming Zhong, Shansan Gong, Lei Li, Jun Zhang, Jingjing Xu, and Lingpeng Kong. 2025. Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34506–34522, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition (An et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1751.pdf
Checklist:: 2025.emnlp-main.1751.checklist.pdf

PDF Cite Search Checklist Fix data