TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models

Euijin Baek; Housam Babiker; Mi-Young Kim; Randy Goebel

TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models

Euijin Baek, Housam Babiker, Mi-Young Kim, Randy Goebel

Abstract

Tiny Recursive Models (TRMs) perform iterative reasoning with an Adaptive Computation Time (ACT)-style loop, but their supervised training targets can be brittle, and their halting behavior can be difficult to tune. We introduce TRM-Planner, a two-stage teacher-cache distillation recipe that shifts compute to an offline teacher-cache stage. A frozen TRM checkpoint is unrolled for multiple refinement steps and stochastic rollouts; for each instance, we cache a small set of teacher entries (tokens, logits, step index, and quality metadata). A student TRM is then trained with the standard TRM objective plus a distillation loss computed from cached entries. Across Sudoku-Extreme and ARC-AGI-1/2, TRM-Planner shows an improvement over our reproduced TRM baseline while leaving student-time inference unchanged. On ARC1/ARC2 with 7M parameters, the two-attempt accuracy (pass@2) increases from 43.1% to 48.1% and 6.7% to 9.2%, respectively.

Anthology ID:: 2026.findings-acl.1350
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27058–27070
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1350/
DOI:
Bibkey:
Cite (ACL):: Euijin Baek, Housam Babiker, Mi-Young Kim, and Randy Goebel. 2026. TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27058–27070, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models (Baek et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1350.pdf
Checklist:: 2026.findings-acl.1350.checklist.pdf

PDF Cite Search Checklist Fix data