Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

Hongru Wang; Deng Cai; Wanjun Zhong; Shijue Huang; Jeff Z. Pan; Zeming Liu; Kam-Fai Wong

Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

Hongru Wang, Deng Cai, Wanjun Zhong, Shijue Huang, Jeff Z. Pan, Zeming Liu, Kam-Fai Wong

Abstract

Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning rationales embody various meta-reasoning skills in human cognition such as reflection and decomposition, being difficult to create and acquire. In this work, we introduce Self-Reasoning Language Model (SRLM), where the model itself can synthesize longer CoT data and iteratively improve performance through self-training. By incorporating a few demonstration examples (i.e., 1,000 samples) on how to unfold hidden reasoning chains from existing responses, which act as a reasoning catalyst, we demonstrate that SRLM not only enhances the model’s initial performance but also ensures more stable and consistent improvements in subsequent iterations. Our proposed SRLM achieves an average absolute improvement of more than +2.5 points across five reasoning tasks: MMLU, GSM8K, ARC-C, HellaSwag, and BBH on two backbone models. Moreover, it brings more improvements with more times of sampling during inference, such as absolute +7.89 average improvement with 64 sampling times, revealing the in-depth, diverse and creative reasoning paths in SRLM against the strong baseline .

Anthology ID:: 2025.findings-acl.291
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5578–5596
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.291/
DOI:
Bibkey:
Cite (ACL):: Hongru Wang, Deng Cai, Wanjun Zhong, Shijue Huang, Jeff Z. Pan, Zeming Liu, and Kam-Fai Wong. 2025. Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5578–5596, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst (Wang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.291.pdf

PDF Cite Search Fix data