Di Zhao

2025

pdf bib abs
COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation
Di Zhao | Longhui Ma | Siwei Wang | Miao Wang | Zhao Lv
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures struggle to adapt to diverse GUI application scenarios, leading to inadequate scenario generalization; (2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for GUI agent decision error. To address these limitations, we introduce COLA, a collaborative multi-agent framework for automating GUI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Experiments on the GAIA dataset show that COLA achieves competitive performance among GUI Agent methods, with an average accuracy of 31.89%. On WindowsAgentArena, it performs particularly well in Web Browser (33.3%), Media & Video (33.3%), and Windows Utils (25.0%), suggesting the effectiveness of specialized agent design and dynamic strategy allocation. The code is available at https://github.com/Alokia/COLA-demo.

2024

pdf bib abs
A Multi-Task Biomedical Named Entity Recognition Method Based on Data Augmentation
Hui Zhao | Di Zhao | Jiana Meng | Shuang Liu | Hongfei Lin
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“The rapid development of artificial intelligence has led to an explosion of literature in the biomed-ical field, and Biomedical Named Entity Recognition (BioNER) can quickly and accurately iden-tify key information from unstructured text. This task has become an important topic to promotethe rapid development of intelligence in the biomedical field. However, in the Named EntityRecognition (NER) of the biomedical field, there are always some problems of unclear boundaryrecognition, the underutilization of hierarchical information in sentences and the scarcity of train-ing data resources. Based on this, this paper proposes a multi-task BioNER model based on dataaugmentation, using four data augmentation methods: Mention Replacement (MR), Label-wisetoken Replacement (LwTR), Shuffle Within Segments (SiS) and Synonym Replacement (SR)to increase the training data. The syntactic information is extracted by incorporating the inputsentence into the Graph Convolutional Network (GCN), and then the tag information encodedby BERT is interacted through a co-attention mechanism to obtain an interaction matrix. Subse-quently, NER is performed through boundary detection tasks and span classification tasks. Com-parative experiments with other methods are conducted on the BC5CDR and JNLPBA datasets,as well as the CCKS2017 dataset. The experimental results demonstrate the effectiveness of themodel proposed in this paper.”

Co-authors

Siwei Wang 1

Miao Wang 1

Hui Zhao (赵晖) 1

Venues

ccl1
emnlp1

Fix author