Do GUI Grounders Truly Understand UI Elements?

Surgan Jandial, Yinheng Li, Justin Wagle, Kazuhito Koishida


Abstract
Graphical User Interface (GUI) grounding is critical for effective GUI agents. Despite recent progress, key challenges remain: 1) existing grounding models and benchmarks are skewed toward web and mobile environments, neglecting desktop interfaces (especially windows); and 2) grounding capability is assessed using accuracy on a single "best" instruction per UI element. However, users can refer to a UI element in diverse valid ways – via visual attributes, spatial relations, etc, and a capable grounding model should produce consistent outputs across such variations. Focusing on desktop environments, we introduce GUI Grounding Sensitivity Benchmark, which investigates the model sensitivity to multiple descriptions of the same UI element. We design an automatic pipeline to generate multiple valid instructions per UI element, and develop nuanced data validation methods, as frontier models even hallucinate to produce a single instruction. Evaluation of 12 models reveals they are reasonably sensitive and their performance on existing benchmarks does not reflect their true ability. Building on the insight that a given grounding model struggles more with certain instructions or relations, we introduce the GUI Grounding Diagnosis Agent, which generates challenging instructions using model feedback and iterative refinement. Our agent reports high success rate (upto 84%) in generating instructions that fail the state-of-the-art GUI grounding models.
Anthology ID:
2026.findings-eacl.144
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2772–2785
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.144/
DOI:
Bibkey:
Cite (ACL):
Surgan Jandial, Yinheng Li, Justin Wagle, and Kazuhito Koishida. 2026. Do GUI Grounders Truly Understand UI Elements?. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2772–2785, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Do GUI Grounders Truly Understand UI Elements? (Jandial et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.144.pdf
Checklist:
 2026.findings-eacl.144.checklist.pdf