A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction
Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, Hitoshi Iyatomi
Abstract
AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding.The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural information such as spatial relationships among elements.We propose A11y-Compressor, a framework that transforms linearized accessibility trees into compact and structured representations.Our implementation, Compressed-a11y, applies a lightweight and structured transformation pipeline with modal detection, redundancy reduction, and semantic structuring.Experiments on the OSWorld benchmark show that Compressed-a11y reduces input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average.- Anthology ID:
- 2026.acl-srw.50
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 563–580
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-srw.50/
- DOI:
- Cite (ACL):
- Michito Takeshita, Takuro Kawada, Takumi Ohashi, Shunsuke Kitada, and Hitoshi Iyatomi. 2026. A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 563–580, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction (Takeshita et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-srw.50.pdf