FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data

WenHao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Guangyi Liu, Liang Liu, Siheng Chen, Yanfeng Wang


Abstract
Mobile GUI agents have attracted tremendous research participation recently. Traditional approaches to mobile agent training rely on centralized data collection, leading to high cost and limited scalability. Distributed training utilizing federated learning offers an alternative by harnessing real-world user data, providing scalability and reducing costs. However, pivotal challenges, including the absence of standardized benchmarks, hinder progress in this field. To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile GUI agents, specifically designed for heterogeneous scenarios. FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments. Through extensive experiments, we uncover several key insights: federated algorithms consistently outperform local training; the distribution of specific apps plays a crucial role in heterogeneity; and, even apps from distinct categories can exhibit correlations during training. FedMABench is publicly available at: https://github.com/wwh0411/FedMABench.
Anthology ID:
2025.emnlp-main.1341
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26398–26419
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1341/
DOI:
Bibkey:
Cite (ACL):
WenHao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Guangyi Liu, Liang Liu, Siheng Chen, and Yanfeng Wang. 2025. FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26398–26419, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data (Wang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1341.pdf
Checklist:
 2025.emnlp-main.1341.checklist.pdf