Gang Wu


2025

pdf bib
GUI Agents: A Survey
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.

2022

pdf bib
BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems
Guangsen Wang | Samson Tan | Shafiq Joty | Gang Wu | Jimmy Au | Steven C.h. Hoi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present BotSIM, a data-efficient end-to-end Bot SIMulation framework for commercial task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM’s effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM’s “generation-simulation-remediation” paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLPJm6_UOKY.