Weiming Li

2026

HiSA: Hierarchical State Abstraction for Scalable GUI Agents
Weiming Li | Hye-young Paik | Yulei Sui
Findings of the Association for Computational Linguistics: ACL 2026

Multimodal GUI agents generally operate on raw visual and textual observations, which creates a fundamental scalability challenge. While current state-of-the-art frameworks predominantly rely on inference-intensive test-time scaling or the accumulation of unbounded raw logs to maintain task coherence, we attribute the underlying bottleneck to insufficient state abstraction.To address this, we propose HiSA, a hierarchical state abstraction approach that actively restructures knowledge rather than passively retaining historical information by organizing raw histories into a three-level hierarchy of abstracted steps, refined contexts, and induced patterns.By synthesizing high-dimensional observations into compact semantic states, HiSA decouples reasoning efficacy from context length, enabling precise and scalable decision-making as interaction histories grow.When evaluating using Spider2-V, our approach establishes a new state-of-the-art, achieving a 40.58% success rate while reducing token consumption by 69.85% and monetary costs by 55.10% compared to the best-performing baseline.

Co-authors

Hye-young Paik 1
Yulei Sui 1

Venues

Findings1

Fix author