TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models

Euijin Baek, Housam Babiker, Mi-Young Kim, Randy Goebel


Abstract
Tiny Recursive Models (TRMs) perform iterative reasoning with an Adaptive Computation Time (ACT)-style loop, but their supervised training targets can be brittle, and their halting behavior can be difficult to tune. We introduce TRM-Planner, a two-stage teacher-cache distillation recipe that shifts compute to an offline teacher-cache stage. A frozen TRM checkpoint is unrolled for multiple refinement steps and stochastic rollouts; for each instance, we cache a small set of teacher entries (tokens, logits, step index, and quality metadata). A student TRM is then trained with the standard TRM objective plus a distillation loss computed from cached entries. Across Sudoku-Extreme and ARC-AGI-1/2, TRM-Planner shows an improvement over our reproduced TRM baseline while leaving student-time inference unchanged. On ARC1/ARC2 with 7M parameters, the two-attempt accuracy (pass@2) increases from 43.1% to 48.1% and 6.7% to 9.2%, respectively.
Anthology ID:
2026.findings-acl.1350
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27058–27070
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1350/
DOI:
Bibkey:
Cite (ACL):
Euijin Baek, Housam Babiker, Mi-Young Kim, and Randy Goebel. 2026. TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27058–27070, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models (Baek et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1350.pdf
Checklist:
 2026.findings-acl.1350.checklist.pdf