Retrieval-augmented GUI Agents with Generative Guidelines

Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, Dong Yu


Abstract
GUI agents powered by vision-language models (VLMs) show promise in automating complex digital tasks. However, their effectiveness in real-world applications is often limited by scarce training data and the inherent complexity of these tasks, which frequently require long-tailed knowledge covering rare, unseen scenarios. We propose RAG-GUI , a lightweight VLM that leverages web tutorials at inferencetime. RAG-GUI is first warm-started via supervised finetuning (SFT) and further refined through self-guided rejection sampling fine-tuning (RSF). Designed to be model-agnostic, RAG-GUI functions as a generic plug-in that enhances any VLM-based agent. Evaluatedacross three distinct tasks, it consistently outperforms baseline agents and surpasses other inference baselines by 2.6% to 13.3% acrosstwo model sizes, demonstrating strong generalization and practical plug-and-play capabilities in real-world scenarios.
Anthology ID:
2025.emnlp-main.902
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17877–17886
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.902/
DOI:
Bibkey:
Cite (ACL):
Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, and Dong Yu. 2025. Retrieval-augmented GUI Agents with Generative Guidelines. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17877–17886, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Retrieval-augmented GUI Agents with Generative Guidelines (Xu et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.902.pdf
Checklist:
 2025.emnlp-main.902.checklist.pdf