AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs

Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Li Qing, Zhaoxiang Zhang


Abstract
User interface understanding with vision-language models (VLMs) has received much attention due to its potential for enhancing software automation.However, existing datasets used to build UI-VLMs either only contain large-scale context-free element annotations or contextualized functional descriptions for elements at a small scale.In this work, we propose the AutoGUI pipeline for automatically annotating UI elements with detailed functionality descriptions at scale.Specifically, we leverage large language models (LLMs) to infer element functionality by comparing UI state changes before and after simulated interactions. To improve annotation quality, we propose LLM-aided rejection and verification, eliminating invalid annotations without human labor.We construct a high-quality AutoGUI-704k dataset using the proposed pipeline, featuring diverse and detailed functionality annotations that are hardly provided by previous datasets.Human evaluation shows that we achieve annotation correctness comparable to a trained human annotator. Extensive experiments show that our dataset remarkably enhances VLM’s UI grounding capabilities and exhibits significant scaling effects. We also show the interesting potential use of our dataset in UI agent tasks. Please view our project at https://autogui-project.github.io/.
Anthology ID:
2025.acl-long.510
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10323–10358
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.510/
DOI:
Bibkey:
Cite (ACL):
Hongxin Li, Jingfan Chen, Jingran Su, Yuntao Chen, Li Qing, and Zhaoxiang Zhang. 2025. AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10323–10358, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs (Li et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.510.pdf