COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation

Di Zhao; Longhui Ma; Siwei Wang; Miao Wang; Zhao Lv

COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation

Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv

Abstract

With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures struggle to adapt to diverse GUI application scenarios, leading to inadequate scenario generalization; (2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for GUI agent decision error. To address these limitations, we introduce COLA, a collaborative multi-agent framework for automating GUI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Experiments on the GAIA dataset show that COLA achieves competitive performance among GUI Agent methods, with an average accuracy of 31.89%. On WindowsAgentArena, it performs particularly well in Web Browser (33.3%), Media & Video (33.3%), and Windows Utils (25.0%), suggesting the effectiveness of specialized agent design and dynamic strategy allocation. The code is available at https://github.com/Alokia/COLA-demo.

Anthology ID:: 2025.emnlp-main.227
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4570–4593
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.227/
DOI:
Bibkey:
Cite (ACL):: Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, and Zhao Lv. 2025. COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4570–4593, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation (Zhao et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.227.pdf
Checklist:: 2025.emnlp-main.227.checklist.pdf

PDF Cite Search Checklist Fix data