Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking

Haohao Luo, Jiayi Kuang, Wei Liu, Ying Shen, Jian Luan, Yang Deng


Abstract
Automating web navigation which aims to build a web agent that follows user instructions to complete tasks like booking flights by interacting with websites, has received increasing attention due to its practical value. Although existing web agents are mostly equipped with visual perception, planning, and memory abilities, their reasoning process are still deviate from human cognition. In this work, we study the human thought pattern to empower agent with more human-like abilities in web navigation. To tackle this problem, we propose a novel multimodal web agent framework called WebExperT, which is designed to emulate the human planning process of “thinking fast and slow” to effectively decompose complex user instructions. Furthermore, WebExperT leverages experiential learning by reflecting from failure for continuously refining planning and decision-making outcomes. Experimental results on the Mind2Web benchmark demonstrate the superiority of WebExperT in both supervised and unsupervised settings.
Anthology ID:
2025.acl-long.697
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14232–14251
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.697/
DOI:
Bibkey:
Cite (ACL):
Haohao Luo, Jiayi Kuang, Wei Liu, Ying Shen, Jian Luan, and Yang Deng. 2025. Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14232–14251, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Browsing Like Human: A Multimodal Web Agent with Experiential Fast-and-Slow Thinking (Luo et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.697.pdf