PGGA: A Plan-Grounded GUI Agent for Automated Device Support

Lei Hsiung; Zhiyu Chen; Seonhoon Kim; Qun Liu

PGGA: A Plan-Grounded GUI Agent for Automated Device Support

Lei Hsiung, Zhiyu Chen, Seonhoon Kim, Qun Liu

Abstract

Current GUI agents struggle with multi-step digital device support. We investigate whether this failure is partly caused by a procedural knowledge deficit: agents often rely on zero-shot visual exploration instead of executing verified instructions. To address this, we introduce the Plan-Grounded GUI Agent (PGGA), framing interface navigation as a knowledge-execution problem by conditioning low-level actions on step-by-step text plans. Evaluated on our focused Device-Support Interaction Benchmark (DSIB), results reveal a sharp gap between knowing which operation to perform and grounding that operation on the screen: GTA1-7B reaches 99.59% Operation Accuracy with expert plans, but only 82.99% Element Accuracy and 45.61% Task Success Rate; without plans, its Task Success Rate is 0.00%. Our fine-tuned 2B-parameter PGGA achieves 54.39% Task Success Rate and 91.28% Element Accuracy when guided by expert plans, suggesting that explicit procedural grounding can substantially improve GUI execution when high-quality plans are available. Project Page: https://hsiung.cc/PGGA/

Anthology ID:: 2026.alvr-main.9
Volume:: Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Qianqi Yan, Syrielle Montariol, Yue Fan, Jing Gu, Jiayi Pan, Manling Li, Parisa Kordjamshidi, Alane Suhr, Xin Eric Wang
Venues:: ALVR | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–114
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.9/
DOI:
Bibkey:
Cite (ACL):: Lei Hsiung, Zhiyu Chen, Seonhoon Kim, and Qun Liu. 2026. PGGA: A Plan-Grounded GUI Agent for Automated Device Support. In Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), pages 105–114, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: PGGA: A Plan-Grounded GUI Agent for Automated Device Support (Hsiung et al., ALVR 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.9.pdf

PDF Cite Search Fix data