On the Fine-Grained Planning Abilities of VLM Web Agents
Surgan Jandial, Yinong Oliver Wang, Andrea Bajcsy, Fernando De la Torre
Abstract
Vision-Language Models (VLMs) have shown promise as web agents, yet their planning—the ability to devise strategies or action sequences to complete tasks—remains understudied. While prior works focus on VLM’s perception and overall success rates (i.e., goal completion), fine-grained investigation of their planning has been overlooked. To address this gap, we examine VLMs’ capability to (1) understand temporal relationships within web contexts, and (2) assess plans of actions across diverse scenarios. We design four simple yet effective tests to delve into these nuanced aspects around planning. Our results across nineteen VLMs reveal that these models exhibit limited performance in the aforementioned skills and are not reliable to function as web agents. To facilitate future work, we release our planning evaluations and data, providing a foundation for advancing the future research in this area.- Anthology ID:
- 2025.findings-emnlp.1382
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 25347–25380
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1382/
- DOI:
- 10.18653/v1/2025.findings-emnlp.1382
- Cite (ACL):
- Surgan Jandial, Yinong Oliver Wang, Andrea Bajcsy, and Fernando De la Torre. 2025. On the Fine-Grained Planning Abilities of VLM Web Agents. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 25347–25380, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- On the Fine-Grained Planning Abilities of VLM Web Agents (Jandial et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1382.pdf