LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Jingyuan Wang; Yankai Chen; Zhonghang Li; Chao Huang

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang

Abstract

Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens—even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter’s unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert’s advantage through expert–amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning.

Anthology ID:: 2026.acl-long.122
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2635–2663
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.122/
DOI:
Bibkey:
Cite (ACL):: Jingyuan Wang, Yankai Chen, Zhonghang Li, and Chao Huang. 2026. LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2635–2663, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.122.pdf
Checklist:: 2026.acl-long.122.checklist.pdf

PDF Cite Search Checklist Fix data