Euijin Baek
2026
TRM-Planner: Offline Target Planning and Distillation for Tiny Recursive Models
Euijin Baek | Housam Babiker | Mi-Young Kim | Randy Goebel
Findings of the Association for Computational Linguistics: ACL 2026
Euijin Baek | Housam Babiker | Mi-Young Kim | Randy Goebel
Findings of the Association for Computational Linguistics: ACL 2026
Tiny Recursive Models (TRMs) perform iterative reasoning with an Adaptive Computation Time (ACT)-style loop, but their supervised training targets can be brittle, and their halting behavior can be difficult to tune. We introduce TRM-Planner, a two-stage teacher-cache distillation recipe that shifts compute to an offline teacher-cache stage. A frozen TRM checkpoint is unrolled for multiple refinement steps and stochastic rollouts; for each instance, we cache a small set of teacher entries (tokens, logits, step index, and quality metadata). A student TRM is then trained with the standard TRM objective plus a distillation loss computed from cached entries. Across Sudoku-Extreme and ARC-AGI-1/2, TRM-Planner shows an improvement over our reproduced TRM baseline while leaving student-time inference unchanged. On ARC1/ARC2 with 7M parameters, the two-attempt accuracy (pass@2) increases from 43.1% to 48.1% and 6.7% to 9.2%, respectively.