Gang Wu
Other people with similar names: Gang Wu, Gang Wu
Unverified author pages with similar names: Gang Wu
2026
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
Paiheng Xu | Gang Wu | Xiang Chen | Tong Yu | Chang Xiao | Franck Dernoncourt | Tianyi Zhou | Wei Ai | Viswanathan Swaminathan
Findings of the Association for Computational Linguistics: EACL 2026
Paiheng Xu | Gang Wu | Xiang Chen | Tong Yu | Chang Xiao | Franck Dernoncourt | Tianyi Zhou | Wei Ai | Viswanathan Swaminathan
Findings of the Association for Computational Linguistics: EACL 2026
Scripting interfaces enable users to automate tasks and customize software workflows, but creating scripts traditionally requires programming expertise and familiarity with specific APIs, posing barriers for many users. While Large Language Models (LLMs) can generate code from natural language queries, runtime code generation is severely limited due to unverified code, security risks, longer response times, and higher computational costs. To bridge the gap, we propose an offline simulation framework to curate a software-specific skillset—a collection of verified scripts—by exploiting LLMs and publicly available scripting guides. Our framework comprises two components: (1) task creation, using top-down functionality guidance and bottom-up API synergy exploration to generate helpful tasks; and (2) skill generation with trials, refining and validating scripts based on execution feedback. To efficiently navigate the extensive API landscape, we introduce a Graph Neural Network (GNN)-based link prediction model to capture API synergy, enabling the generation of skills involving underutilized APIs and expanding the skillset’s diversity. Experiments with Adobe Illustrator demonstrate that our framework significantly improves automation success rates, reduces response time, and saves runtime token costs compared to traditional runtime code generation. This is the first attempt to use software scripting interfaces as a testbed for LLM-based systems, highlighting the advantages of leveraging execution feedback in a controlled environment and offering valuable insights into aligning AI capabilities with user needs in specialized software domains.
2025
GUI Agents: A Survey
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Graphical User Interface (GUI) action grounding, mapping language instructions to actionable elements on GUI screens, is important for assisting users in interactive tutorials, task automation, accessibility support, etc. Most recent works of GUI action grounding use large GUI datasets to fine-tune Multimodal Large Language Models (MLLMs). However, the fine-tuning data is inherently limited to specific GUI environments, leading to significant performance degradation in novel environments due to the generalization challenges in the GUI domain. Therefore, we argue that GUI action grounding models should be further aligned with novel environments before deployment to optimize their performance. To address this, we first propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration and then continuously fine-tune GUI grounding models with the collected data. To ensure the GUI action grounding models generalize to various screens within the target novel environment after the continuous fine-tuning, we equip GUI-Bee with a novel Q-value-Incentive In-Context Reinforcement Learning (Q-ICRL) algorithm that optimizes exploration efficiency and exploration data quality. In the experiment, we introduce NovelScreenSpot to test how well the data can help align GUI action grounding models to novel environments. Furthermore, we conduct an ablation study to validate the Q-ICRL method in enhancing the efficiency of GUI-Bee.
Search
Fix author
Co-authors
- Franck Dernoncourt 2
- Tong Yu 2
- Ruiyi Zhang 2
- Tianyi Zhou 2
- Nesreen K. Ahmed 1
- Wei Ai 1
- Ryan Aponte 1
- Trung Bui 1
- Jian Chen 1
- Hongjie Chen 1
- Xiang Chen 1
- Yue Fan 1
- Zhengmian Hu 1
- Jihyung Kil 1
- Sungchul Kim 1
- Branislav Kveton 1
- Viet Dac Lai 1
- Xintong Li 1
- Hanjia Lyu 1
- Puneet Mathur 1
- Dang Nguyen 1
- Thien Huu Nguyen 1
- Namyong Park 1
- Ryan A. Rossi 1
- Yu Shen 1
- Jing Shi 1
- Viswanathan Swaminathan 1
- Mehrab Tanjim 1
- Yu Wang 1
- Xin Eric Wang 1
- Junda Wu 1
- Yu Xia 1
- Chang Xiao 1
- Zhouhang Xie 1
- Paiheng Xu 1
- Lina Yao 1
- Seunghyun Yoon 1
- Handong Zhao 1