Yong Liu
Other people with similar names: Yong Liu, Yong Liu
Unverified author pages with similar names: Yong Liu
2026
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
Guangyi Liu | Pengxiang Zhao | Liang Liu | Zhiming Chen | Yuxiang Chai | Yaozhen Liang | WenHao Wang | Siheng Chen | Zhengxi Lu | Shuai Ren | Hao Wang | Shibo He | Yong Liu | Wenchao Meng
Findings of the Association for Computational Linguistics: ACL 2026
Guangyi Liu | Pengxiang Zhao | Liang Liu | Zhiming Chen | Yuxiang Chai | Yaozhen Liang | WenHao Wang | Siheng Chen | Zhengxi Lu | Shuai Ren | Hao Wang | Shibo He | Yong Liu | Wenchao Meng
Findings of the Association for Computational Linguistics: ACL 2026
Mobile GUI agents show promise in automating tasks but face significant generalization challenges in long-tail scenarios. While learning from few-shot demonstrations is an emerging solution, its progress is hindered by two critical gaps: the lack of a comprehensive benchmark for systematic evaluation on mobile devices, and the absence of a systematic framework designed to learn from demonstrations in this domain. To address these gaps, we introduce LearnGUI, the first comprehensive benchmark designed for studying demonstration-based learning in mobile agents, comprising 2,252 offline and 101 online tasks. We further develop LearnAct, a modular agent framework engineered to systematically extract, retrieve, and leverage knowledge from visual demonstrations. Extensive evaluations across six backbone models validate our approach: LearnAct achieves dramatic improvements for general-purpose models (e.g., Gemini-2.5-Pro: 38.5%→58.9%) and specialized models alike (e.g., UI-TARS-7B-SFT’s online success rate: 18.1%→32.8%), demonstrating consistent gains across model architectures. Our work provides a robust benchmark and a systematic framework, paving the way for more adaptable and practical mobile agents. Our code and data are publicly available at https://lgy0404.github.io/LearnAct/.
The Agent’s First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
Daocheng Fu | Jianbiao Mei | Rong Wu | Xuemeng Yang | Jia Xu | Ding Wang | Pinlong Cai | Yong Liu | Licheng Wen | Botian Shi
Findings of the Association for Computational Linguistics: ACL 2026
Daocheng Fu | Jianbiao Mei | Rong Wu | Xuemeng Yang | Jia Xu | Ding Wang | Pinlong Cai | Yong Liu | Licheng Wen | Botian Shi
Findings of the Association for Computational Linguistics: ACL 2026
The rapid evolution of Multi-modal Large Language Models (MLLMs) has advanced workflow automation; however, existing research mainly targets performance upper bounds in static environments, overlooking robustness for stochastic real-world deployment. We identify three key challenges: dynamic task scheduling, active exploration under uncertainty, and continuous learning from experience. To bridge this gap, we introduce TraineeBench, a dynamic evaluation environment that simulates a "trainee" agent continuously exploring a novel setting. Unlike traditional benchmarks, TraineeBench evaluates agents along three dimensions: (1) context-aware scheduling for streaming tasks with varying priorities; (2) prudent information acquisition to reduce hallucination via active exploration; and (3) continuous evolution by distilling generalized strategies from rule-based, dynamically generated tasks. Experiments show that cutting-edge agents have significant deficiencies in dynamic environments, especially in active exploration and continual learning. Our work establishes a framework for assessing agent reliability, shifting evaluation from static tests to realistic, production-oriented scenarios.
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
Pengxiang Zhao | Guangyi Liu | Yaozhen Liang | Weiqing He | Zhengxi Lu | WenHao Wang | Yuehao Huang | Yuxiang Chai | Zhaolu Kang | Yaxuan Guo | Hao Wang | Kexin Zhang | Liang Liu | Yong Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pengxiang Zhao | Guangyi Liu | Yaozhen Liang | Weiqing He | Zhengxi Lu | WenHao Wang | Yuehao Huang | Yuxiang Chai | Zhaolu Kang | Yaxuan Guo | Hao Wang | Kexin Zhang | Liang Liu | Yong Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shortcuts such as APIs and deep-links have emerged as efficient complements to flexible GUI operations, fostering a promising hybrid paradigm for MLLM-based mobile automation. However, systematic evaluation of GUI–shortcut hybrid agents remains largely underexplored. To bridge this gap, we introduce **MAS-Bench**, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain. Beyond merely using predefined shortcuts, MAS-Bench assesses an agent’s capability to *autonomously generate* shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 9 evaluation metrics. Experiments demonstrate that hybrid agents achieve up to 68.3% success rate and 39% greater execution efficiency than GUI-only counterparts. Furthermore, our evaluation framework effectively reveals the quality gap between predefined and agent-generated shortcuts, validating its capability to assess shortcut generation methods. MAS-Bench addresses the lack of systematic benchmarks for GUI-shortcut hybrid mobile agents, providing a foundational platform for future advancements in creating more efficient and robust intelligent agents.
2024
Structured Optimal Brain Pruning for Large Language Models
Jiateng Wei | Quan Lu | Ning Jiang | Siqi Li | Jingyang Xiang | Jun Chen | Yong Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jiateng Wei | Quan Lu | Ning Jiang | Siqi Li | Jingyang Xiang | Jun Chen | Yong Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The massive parameters and computational demands hinder the widespread application of Large Language Models (LLMs). Network pruning provides a practical solution to this problem. However, existing pruning works for LLMs mainly focus on unstructured pruning or necessitate post-pruning fine-tuning. The former relies on special hardware to accelerate computation, while the latter may need substantial computational resources. In this paper, we introduce a retraining-free structured pruning method called SoBP (Structured Optimal Brain Pruning). It leverages global first-order information to select pruning structures, then refines them with a local greedy approach, and finally adopts module-wise reconstruction to mitigate information loss. We assess the effectiveness of SoBP across 14 models from 3 LLM families on 8 distinct datasets. Experimental results demonstrate that SoBP outperforms current state-of-the-art methods.
Search
Fix author
Co-authors
- Yuxiang Chai 2
- Yaozhen Liang 2
- Guangyi Liu 2
- Liang Liu (陆亮) 2
- Zhengxi Lu 2
- Wenhao Wang 2
- Hao Wang 2
- Pengxiang Zhao 2
- Pinlong Cai 1
- Zhiming Chen 1
- Siheng Chen 1
- Jun Chen 1
- Daocheng Fu 1
- Yaxuan Guo 1
- Shibo He 1
- Weiqing He 1
- Yuehao Huang 1
- Ning Jiang 1
- Zhaolu Kang 1
- Siqi Li 1
- Quan Lu 1
- Jianbiao Mei 1
- Wenchao Meng 1
- Shuai Ren 1
- Botian Shi 1
- Ding Wang 1
- Jiateng Wei 1
- Licheng Wen 1
- Rong Wu 1
- Jingyang Xiang 1
- Jia Xu 1
- Xuemeng Yang 1
- Kexin Zhang 1