GUITester: Enabling GUI Agents for Exploratory Defect Discovery

Yifei Gao, Jiang Wu, Xiaoyi Chen, Yifan Yang, Zhe Cui, Tianyi Ma, Jiaming Zhang, Jitao Sang


Abstract
Exploratory GUI testing is essential for software quality but suffers from high manual costs. While Multi-modal Large Language Model (MLLM) agents excel in navigation, they fail to autonomously discover defects due to two core challenges: Goal-Oriented Masking, where agents prioritize task completion over reporting anomalies, and Execution-Bias Attribution, where system defects are misidentified as agent errors. To address these, we first introduce GUITestBench, the first interactive benchmark for this task, featuring 143 tasks across 26 defects. We then propose GUITester, a multi-agent framework that decouples navigation from verification via two modules: (i) a Planning-Execution Module (PEM) that proactively probes for defects via embedded testing intents, and (ii) a Hierarchical Reflection Module (HRM) that resolves attribution ambiguity through interaction history analysis. GUITester achieves an F1-score of 48.90% (Pass@3) on GUITestBench, outperforming state-of-the-art baselines (33.35%). Our work demonstrates the feasibility of autonomous exploratory testing and provides a robust foundation for future GUI quality assurance.
Anthology ID:
2026.findings-acl.946
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18956–18978
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.946/
DOI:
Bibkey:
Cite (ACL):
Yifei Gao, Jiang Wu, Xiaoyi Chen, Yifan Yang, Zhe Cui, Tianyi Ma, Jiaming Zhang, and Jitao Sang. 2026. GUITester: Enabling GUI Agents for Exploratory Defect Discovery. In Findings of the Association for Computational Linguistics: ACL 2026, pages 18956–18978, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GUITester: Enabling GUI Agents for Exploratory Defect Discovery (Gao et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.946.pdf
Checklist:
 2026.findings-acl.946.checklist.pdf