Gang Wu
2025
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Graphical User Interface (GUI) action grounding, mapping language instructions to actionable elements on GUI screens, is important for assisting users in interactive tutorials, task automation, accessibility support, etc. Most recent works of GUI action grounding use large GUI datasets to fine-tune Multimodal Large Language Models (MLLMs). However, the fine-tuning data is inherently limited to specific GUI environments, leading to significant performance degradation in novel environments due to the generalization challenges in the GUI domain. Therefore, we argue that GUI action grounding models should be further aligned with novel environments before deployment to optimize their performance. To address this, we first propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration and then continuously fine-tune GUI grounding models with the collected data. To ensure the GUI action grounding models generalize to various screens within the target novel environment after the continuous fine-tuning, we equip GUI-Bee with a novel Q-value-Incentive In-Context Reinforcement Learning (Q-ICRL) algorithm that optimizes exploration efficiency and exploration data quality. In the experiment, we introduce NovelScreenSpot to test how well the data can help align GUI action grounding models to novel environments. Furthermore, we conduct an ablation study to validate the Q-ICRL method in enhancing the efficiency of GUI-Bee.
GUI Agents: A Survey
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
A Survey on Small Language Models
Chien Van Nguyen | Xuan Shen | Ryan Aponte | Yu Xia | Samyadeep Basu | Zhengmian Hu | Jian Chen | Mihir Parmar | Sasidhar Kunapuli | Joe Barrow3 | Junda Wu | Ashish Singh | Yu Wang | Jiuxiang Gu | Nesreen K. Ahmed | Nedim Lipka | Ruiyi Zhang | Xiang Chen | Tong Yu | Sungchul Kim | Hanieh Deilamsalehy | Namyong Park | Michael Rimer | Zhehao Zhang | Huanrui Yang | Puneet Mathur | Gang Wu | Franck Dernoncourt | Ryan Rossi | Thien Huu Nguyen
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Chien Van Nguyen | Xuan Shen | Ryan Aponte | Yu Xia | Samyadeep Basu | Zhengmian Hu | Jian Chen | Mihir Parmar | Sasidhar Kunapuli | Joe Barrow3 | Junda Wu | Ashish Singh | Yu Wang | Jiuxiang Gu | Nesreen K. Ahmed | Nedim Lipka | Ruiyi Zhang | Xiang Chen | Tong Yu | Sungchul Kim | Hanieh Deilamsalehy | Namyong Park | Michael Rimer | Zhehao Zhang | Huanrui Yang | Puneet Mathur | Gang Wu | Franck Dernoncourt | Ryan Rossi | Thien Huu Nguyen
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources, making them ideal for various settings including on-device, mobile, edge devices, among many others. In this article, we present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques. We propose a novel taxonomy for categorizing the methods used to optimize SLMs, including model compression, pruning, and quantization techniques. We summarize the benchmark datasets that are useful for benchmarking SLMs along with the evaluation metrics commonly used. Additionally, we highlight key open challenges that remain to be addressed. Our survey aims to serve as a valuable resource for researchers and practitioners interested in developing and deploying small yet efficient language models.
2022
BotSIM: An End-to-End Bot Simulation Framework for Commercial Task-Oriented Dialog Systems
Guangsen Wang | Samson Tan | Shafiq Joty | Gang Wu | Jimmy Au | Steven C.h. Hoi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Guangsen Wang | Samson Tan | Shafiq Joty | Gang Wu | Jimmy Au | Steven C.h. Hoi
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present BotSIM, a data-efficient end-to-end Bot SIMulation framework for commercial task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM’s effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM’s “generation-simulation-remediation” paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLPJm6_UOKY.
Search
Fix author
Co-authors
- Ruiyi Zhang 3
- Nesreen K. Ahmed 2
- Ryan Aponte 2
- Franck Dernoncourt 2
- Zhengmian Hu 2
- Sungchul Kim 2
- Puneet Mathur 2
- Thien Huu Nguyen 2
- Namyong Park 2
- Junda Wu 2
- Yu Xia 2
- Tong Yu 2
- Jimmy Au 1
- Joe Barrow3 1
- Samyadeep Basu 1
- Trung Bui 1
- Jian Chen 1
- Hongjie Chen 1
- Jian Chen 1
- Xiang Chen 1
- Hanieh Deilamsalehy 1
- Yue Fan 1
- Jiuxiang Gu 1
- Steven C.H. Hoi 1
- Shafiq Joty 1
- Jihyung Kil 1
- Sasidhar Kunapuli 1
- Branislav Kveton 1
- Viet Dac Lai 1
- Xintong Li 1
- Nedim Lipka 1
- Hanjia Lyu 1
- Dang Nguyen 1
- Chien Van Nguyen 1
- Mihir Parmar 1
- Michael Rimer 1
- Ryan A. Rossi 1
- Ryan Rossi 1
- Yu Shen 1
- Xuan Shen 1
- Jing Shi 1
- Ashish Singh 1
- Samson Tan 1
- Mehrab Tanjim 1
- Guangsen Wang 1
- Xin Eric Wang 1
- Yu Wang 1
- Yu Wang 1
- Zhouhang Xie 1
- Huanrui Yang 1
- Lina Yao 1
- Seunghyun Yoon 1
- Zhehao Zhang 1
- Handong Zhao 1
- Tianyi Zhou 1