Teach Small Models to Reason by Curriculum Distillation

Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun


Abstract
Large Reasoning Models (LRMs) show strong System-2-style reasoning, but at the cost of significant computational overhead. In contrast, efficient System-1-style Large Language Models (LLMs) often struggle on complex tasks. We identify a critical asymmetry between these two paradigms: LRMs can implicitly self-distill their own reasoning, solving hard problems with near System-1-style efficiency while retaining superior performance. LLMs, however, lack such deep internal modes and collapse when forced to rely on their own reasoning rather than imitating external traces. This asymmetry explains why direct distillation from strong LRMs to weaker LLMs often fails: student models struggle to learn from LRMs’ overly complex explicit reasoning and gain little from their overly compact implicit solutions. To address this, we introduce a two-stage curriculum distillation framework, which first builds a robust internal problem-solving student model and then teaches the student model to externalize this latent knowledge as explicit reasoning. On challenging mathematical benchmarks, our method significantly outperforms single-stage baselines, creating compact models with strong reasoning ability.
Anthology ID:
2025.emnlp-main.376
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7423–7433
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.376/
DOI:
Bibkey:
Cite (ACL):
Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, and Le Sun. 2025. Teach Small Models to Reason by Curriculum Distillation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7423–7433, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Teach Small Models to Reason by Curriculum Distillation (Jiang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.376.pdf
Checklist:
 2025.emnlp-main.376.checklist.pdf