Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, Ziyu Yao


Abstract
Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework that analyzes web agents across three layers (i.e., high-level planning, low-level execution, and re-planning), enabling process-based evaluation of reasoning, grounding, and recovery. Our experiments show that structured Planning Domain Definition Language (PDDL) plans produce more concise and goal-directed strategies than natural language (NL) plans, but low-level execution remains the dominant bottleneck. These results indicate that improving perceptual grounding and adaptive control, not only high-level reasoning, is critical for achieving human-level reliability. This hierarchical perspective provides a principled foundation for diagnosing and advancing LLM web agents.
Anthology ID:
2026.acl-long.1483
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32157–32180
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1483/
DOI:
Bibkey:
Cite (ACL):
Mohamed Aghzal, Gregory J. Stein, and Ziyu Yao. 2026. Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32157–32180, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective (Aghzal et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1483.pdf
Checklist:
 2026.acl-long.1483.checklist.pdf