MaDS: Long-Horizon GUI Automation via Synergizing Dual-Layer Memory and Multi-Round Debate

Pengchen Chen, Shi Chen, Qiming Ye, Xinli Chen, Xinran Li, Wei Xiang


Abstract
Automating Graphical User Interface (GUI) operations with Multimodal Large Language Models (MLLMs) is promising but remains bottlenecked in real-world long-horizon settings. Key challenges include ensuring precise grounding across diverse interfaces and handling irreversible errors in extended workflows. Current methods often struggle to distinguish targets in low Signal-to-Noise Ratio (SNR) environments and lack sufficient pre-execution verification to prevent error accumulation. To address this, we propose the Memory-augmented Debate System (MaDS). Specifically, MaDS combines: (1) a Dual-Layer Memory Module that integrates universal interaction priors with scenario-specific operational experience to mitigate grounding hallucinations; and (2) Multi-Round Debate that performs pre-execution verification, while transforming execution failures into retrievable Negative Warnings to reduce repeated errors. Additionally, we introduce MaDS-Benchmark, a benchmark for long-horizon mobile GUI tasks with process-oriented evaluation. Experiments show that MaDS achieves a 90.23% Task Success Rate on MaDS-Benchmark and strong performance on public benchmarks including AITW, AITZ, CAGUI, and GUIOdyssey.
Anthology ID:
2026.acl-long.1202
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26158–26180
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1202/
DOI:
Bibkey:
Cite (ACL):
Pengchen Chen, Shi Chen, Qiming Ye, Xinli Chen, Xinran Li, and Wei Xiang. 2026. MaDS: Long-Horizon GUI Automation via Synergizing Dual-Layer Memory and Multi-Round Debate. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 26158–26180, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
MaDS: Long-Horizon GUI Automation via Synergizing Dual-Layer Memory and Multi-Round Debate (Chen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1202.pdf
Checklist:
 2026.acl-long.1202.checklist.pdf