When Internalization Fails: Finding Better Targets for Reasoning Compression

Mourad Heddaya; Manley Roberts; Rohan Wadhawan; Chenhao Tan

When Internalization Fails: Finding Better Targets for Reasoning Compression

Mourad Heddaya, Manley Roberts, Rohan Wadhawan, Chenhao Tan

Abstract

Reasoning language models generate long reasoning traces that increase latency and cost. We study how to shorten these traces while preserving accuracy on competition-level mathematics. In a teacher-student distillation setup, we compare three approaches: (i) inference-time truncation after the first k tokens, (ii) Implicit Chain-of-Thought (ICoT)-style curricula that progressively shorten the teacher trace during training, and (iii) direct distillation to shorter reasoning traces. Using NuminaMath 1.5 with traces from DeepSeek-R1 and QwQ-32B, we distill into Qwen2.5-7B and measure accuracy against total tokens generated. We find: (1) with standard SFT and first-k truncation, models compensate by generating longer text after reasoning, undermining token savings; (2) ICoT-style curricula provide little benefit on competition-level mathematics, where reasoning traces are long and diverse; and (3) training on post-think, text the teacher generates after reasoning, achieves the best accuracy–efficiency trade-off among all shortened targets, outperforming generic summaries at matched token budgets. These results show that curriculum-based internalization methods effective on simple tasks do not transfer to complex reasoning, and that post-think provides a better distillation target.

Anthology ID:: 2026.findings-acl.734
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14935–14946
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.734/
DOI:
Bibkey:
Cite (ACL):: Mourad Heddaya, Manley Roberts, Rohan Wadhawan, and Chenhao Tan. 2026. When Internalization Fails: Finding Better Targets for Reasoning Compression. In Findings of the Association for Computational Linguistics: ACL 2026, pages 14935–14946, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: When Internalization Fails: Finding Better Targets for Reasoning Compression (Heddaya et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.734.pdf
Checklist:: 2026.findings-acl.734.checklist.pdf

PDF Cite Search Checklist Fix data