VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Seungwon Lim; Sungwoong Kim; Jihwan Yu; Sungjae Lee; Jiwan Chung; Youngjae Yu

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu

Abstract

Escape rooms present a unique cognitive challenge that demands exploration-driven planning: with the sole instruction to escape the room, players must actively search their environment, collecting information, and finding solutions through repeated trial and error. Motivated by this, we introduce VisEscape, a benchmark of 20 virtual escape rooms specifically designed to evaluate AI models under these challenging conditions, where success depends not only on solving isolated puzzles but also on iteratively constructing and refining spatial-temporal knowledge of a dynamically changing environment. On VisEscape, we observe that even state-of-the-art multi-modal models generally fail to escape the rooms, showing considerable variation in their progress and problem-solving approaches. We find that integrating memory management and reasoning contributes to efficient exploration and enables successive hypothesis formulation and testing, thereby leading to significant improvements in dynamic and exploration-driven environments.

Anthology ID:: 2025.emnlp-main.810
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16031–16058
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.810/
DOI:
Bibkey:
Cite (ACL):: Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, and Youngjae Yu. 2025. VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 16031–16058, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms (Lim et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.810.pdf
Checklist:: 2025.emnlp-main.810.checklist.pdf

PDF Cite Search Checklist Fix data