LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

Guangyi Liu; Pengxiang Zhao; Liang Liu (陆亮); Zhiming Chen; Yuxiang Chai; Yaozhen Liang; Wenhao Wang; Siheng Chen; Zhengxi Lu; Shuai Ren; Hao Wang; Shibo He; Yong Liu; Wenchao Meng

LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

Guangyi Liu, Pengxiang Zhao, Liang Liu, Zhiming Chen, Yuxiang Chai, Yaozhen Liang, WenHao Wang, Siheng Chen, Zhengxi Lu, Shuai Ren, Hao Wang, Shibo He, Yong Liu, Wenchao Meng

Abstract

Mobile GUI agents show promise in automating tasks but face significant generalization challenges in long-tail scenarios. While learning from few-shot demonstrations is an emerging solution, its progress is hindered by two critical gaps: the lack of a comprehensive benchmark for systematic evaluation on mobile devices, and the absence of a systematic framework designed to learn from demonstrations in this domain. To address these gaps, we introduce LearnGUI, the first comprehensive benchmark designed for studying demonstration-based learning in mobile agents, comprising 2,252 offline and 101 online tasks. We further develop LearnAct, a modular agent framework engineered to systematically extract, retrieve, and leverage knowledge from visual demonstrations. Extensive evaluations across six backbone models validate our approach: LearnAct achieves dramatic improvements for general-purpose models (e.g., Gemini-2.5-Pro: 38.5%→58.9%) and specialized models alike (e.g., UI-TARS-7B-SFT’s online success rate: 18.1%→32.8%), demonstrating consistent gains across model architectures. Our work provides a robust benchmark and a systematic framework, paving the way for more adaptable and practical mobile agents. Our code and data are publicly available at https://lgy0404.github.io/LearnAct/.

Anthology ID:: 2026.findings-acl.1491
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29820–29843
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1491/
DOI:
Bibkey:
Cite (ACL):: Guangyi Liu, Pengxiang Zhao, Liang Liu, Zhiming Chen, Yuxiang Chai, Yaozhen Liang, WenHao Wang, Siheng Chen, Zhengxi Lu, Shuai Ren, Hao Wang, Shibo He, Yong Liu, and Wenchao Meng. 2026. LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29820–29843, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark (Liu et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1491.pdf
Checklist:: 2026.findings-acl.1491.checklist.pdf

PDF Cite Search Checklist Fix data