Model-Based Imaginative Planning for Embodied Agents

Junru Song; Hengzhe Jin; Yucong Huang; Tingsong Jiang; Weien Zhou; Feifei Wang; Yang Yang; Ying Wen; Wen Yao

Model-Based Imaginative Planning for Embodied Agents

Junru Song, Hengzhe Jin, Yucong Huang, Tingsong Jiang, Weien Zhou, Feifei Wang, Yang Yang, Ying Wen, Wen Yao

Abstract

Reasoning and planning critically rely on a predictive dynamics model. In symbolic domains such as mathematics and code, large language models (LLMs) internalize transition rules during pretraining, allowing reinforcement learning or test-time scaling to effectively elicit and generalize their reasoning ability. Embodied decision making is fundamentally different: agents must reason from sparse visual evidence under partial observability, while coping with environment-specific dynamics and affordances not captured by language priors. Here we propose IMPLEMENT, a model-based reasoning framework that enables frozen LLMs to perform imaginative planning. A lightweight world model converts raw pixels into object-centric symbolic states amenable to language-based reasoning, and predicts their evolution under hypothetical actions. To address partial observability, we perform Monte Carlo state prediction via temperature sampling, enabling decision evaluation over multiple plausible futures. To support adaptation to unseen environments, we integrate Meta In-Context Learning, conditioning the world model on interaction history to continuously refine its predictions. At inference time, the LLM and world model form a tight co-reasoning loop: the LLM proposes candidate actions, the world model simulates future trajectories, and the LLM refines its decisions, effectively inducing an online policy iteration scheme. Extensive experiments in ALFWorld demonstrate consistent advantages over finetuning-based and strong test-time scaling approaches, validating IMPLEMENT as an effective framework for grounding language agents in visual embodied environments.

Anthology ID:: 2026.acl-long.827
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18125–18147
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.827/
DOI:
Bibkey:
Cite (ACL):: Junru Song, Hengzhe Jin, Yucong Huang, Tingsong Jiang, Weien Zhou, Feifei Wang, Yang Yang, Ying Wen, and Wen Yao. 2026. Model-Based Imaginative Planning for Embodied Agents. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18125–18147, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Model-Based Imaginative Planning for Embodied Agents (Song et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.827.pdf
Checklist:: 2026.acl-long.827.checklist.pdf

PDF Cite Search Checklist Fix data