DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks

Yuqing Zhang, Honghui Sheng, Xueyu Hu, Shengyu Zhang, Fei Wu


Abstract
Mobile GUI agents powered by LMMs can perceive screens and follow instructions, yet existing benchmarks largely target short, linear workflows and step-level accuracy, offering limited insight into long-horizon planning and decision-making under branching structures. We present DAC-Bench, a decision-aware benchmark with compositional tasks comprising 830 episodes and 11,345 action steps across 35 applications on Android and iOS. Tasks are organized into Sequential, Conjunctive, Conditional, and Hierarchical structures, reflecting real-world multi-step and branching interaction patterns. To complement standard step-level evaluation, we introduce weighted longest common subsequence to capture length-sensitive progress and decision accuracy for branch correctness. Evaluations across 7 diverse agents show substantial performance degradation compared to prior benchmarks, with success rates dropping below 5% on 6–8 step tasks and branch accuracy averaging 38%, highlighting challenges in conditional decision-making. By exposing these failure modes, DAC-Bench provides a challenging and diagnostic benchmark for advancing decision-aware mobile GUI agents. Our code and dataset are available at: https://github.com/YuqingZhangMirror12/DAC-Bench.
Anthology ID:
2026.acl-long.2064
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44578–44598
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2064/
DOI:
Bibkey:
Cite (ACL):
Yuqing Zhang, Honghui Sheng, Xueyu Hu, Shengyu Zhang, and Fei Wu. 2026. DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44578–44598, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks (Zhang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2064.pdf
Checklist:
 2026.acl-long.2064.checklist.pdf