DART: Distilling Autoregressive Reasoning to Silent Thought

Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian


Abstract
Chain-of-Thought (CoT) reasoning has significantly advanced Large Language Models (LLMs) in solving complex tasks. However, its autoregressive paradigm leads to significant computational overhead, hindering its deployment in latency-sensitive applications. To address this, we propose **DART** (**D**istilling **A**utoregressive **R**easoning to Silent **T**hought), a self-distillation framework that enables LLMs to replace autoregressive CoT with non-autoregressive Silent Thought (ST). Specifically, DART introduces two training pathways: the CoT pathway for traditional reasoning and the ST pathway for generating answers directly from a few ST tokens. The ST pathway utilizes a lightweight Reasoning Evolvement Module (REM) to align its hidden states with the CoT pathway, enabling the ST tokens to evolve into informative embeddings. During inference, only the ST pathway is activated, leveraging evolving ST tokens to deliver the answer directly. Extensive experimental results demonstrate that DART offers significant performance gains compared with existing non-autoregressive baselines without extra inference latency, serving as a feasible alternative for efficient reasoning.
Anthology ID:
2025.emnlp-main.256
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5100–5108
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.256/
DOI:
Bibkey:
Cite (ACL):
Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, and Shaobing Lian. 2025. DART: Distilling Autoregressive Reasoning to Silent Thought. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5100–5108, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
DART: Distilling Autoregressive Reasoning to Silent Thought (Jiang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.256.pdf
Checklist:
 2025.emnlp-main.256.checklist.pdf