MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen, Kai Yu


Abstract
Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI environments, which integrate text, images, and spatial relationships, as well as the variability in action spaces across different pages and tasks. To address these limitations, we propose MobA, a novel MLLM-based mobile assistant system. MobA introduces an adaptive planning module that incorporates a reflection mechanism for error recovery and dynamically adjusts plans to align with the real environment contexts and action module’s execution capacity. Additionally, a multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency. We also present MobBench, a dataset designed for complex mobile interactions. Experimental results on MobBench and AndroidArena demonstrate MobA’s ability to handle dynamic GUI environments and perform complex mobile tasks.
Anthology ID:
2025.naacl-demo.43
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Nouha Dziri, Sean (Xiang) Ren, Shizhe Diao
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
535–549
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.43/
DOI:
Bibkey:
Cite (ACL):
Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen, and Kai Yu. 2025. MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), pages 535–549, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation (Zhu et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-demo.43.pdf