ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments

Dong Wang; Xinghang Li; Zhengshen Zhang; Jirong Liu; Xiao Ma; Hanbo Zhang; Tao Kong; Huaping Liu

ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments

Dong Wang, Xinghang Li, Zhengshen Zhang, Jirong Liu, Xiao Ma, Hanbo Zhang, Tao Kong, Huaping Liu

Abstract

We introduce ProcWorld, a large-scale benchmark for partially observable embodied spatial reasoning and long-term planning with large language models (LLM) and vision language models (VLM). ProcWorld features a wide range of challenging embodied navigation and object manipulation tasks, covering 16 task types, 5,000 rooms, and over 10 million evaluation trajectories with diverse data distribution. ProcWorld supports configurable observation modes, ranging from text-only descriptions to vision-only observations. It enables text-based actions to control the agent following language instructions. ProcWorld has presented significant challenges for LLMs and VLMs: (1) active information gathering given partial observations for disambiguation; (2) simultaneous localization and decision-making by tracking the spatio-temporal state-action distribution; (3) constrained reasoning with dynamic states subject to physical reachability. Our extensive evaluation of 15 foundation models and 5 reasoning algorithms (with over 1 million rollouts) indicates larger models perform better. However, ProcWorld remains highly challenging for existing state-of-the-art models and in-context learning methods due to constrained reachability and the need for combinatorial spatial reasoning.

Anthology ID:: 2025.emnlp-main.635
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12575–12605
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.635/
DOI:
Bibkey:
Cite (ACL):: Dong Wang, Xinghang Li, Zhengshen Zhang, Jirong Liu, Xiao Ma, Hanbo Zhang, Tao Kong, and Huaping Liu. 2025. ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12575–12605, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments (Wang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.635.pdf
Checklist:: 2025.emnlp-main.635.checklist.pdf

PDF Cite Search Checklist Fix data