Gang Wu
Other people with similar names: Gang Wu, Gang Wu
Unverified author pages with similar names: Gang Wu
2026
Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs
Paiheng Xu | Gang Wu | Xiang Chen | Tong Yu | Chang Xiao | Franck Dernoncourt | Tianyi Zhou | Wei Ai | Viswanathan Swaminathan
Findings of the Association for Computational Linguistics: EACL 2026
Paiheng Xu | Gang Wu | Xiang Chen | Tong Yu | Chang Xiao | Franck Dernoncourt | Tianyi Zhou | Wei Ai | Viswanathan Swaminathan
Findings of the Association for Computational Linguistics: EACL 2026
Scripting interfaces enable users to automate tasks and customize software workflows, but creating scripts traditionally requires programming expertise and familiarity with specific APIs, posing barriers for many users. While Large Language Models (LLMs) can generate code from natural language queries, runtime code generation is severely limited due to unverified code, security risks, longer response times, and higher computational costs. To bridge the gap, we propose an offline simulation framework to curate a software-specific skillset—a collection of verified scripts—by exploiting LLMs and publicly available scripting guides. Our framework comprises two components: (1) task creation, using top-down functionality guidance and bottom-up API synergy exploration to generate helpful tasks; and (2) skill generation with trials, refining and validating scripts based on execution feedback. To efficiently navigate the extensive API landscape, we introduce a Graph Neural Network (GNN)-based link prediction model to capture API synergy, enabling the generation of skills involving underutilized APIs and expanding the skillset’s diversity. Experiments with Adobe Illustrator demonstrate that our framework significantly improves automation success rates, reduces response time, and saves runtime token costs compared to traditional runtime code generation. This is the first attempt to use software scripting interfaces as a testbed for LLM-based systems, highlighting the advantages of leveraging execution feedback in a controlled environment and offering valuable insights into aligning AI capabilities with user needs in specialized software domains.
GraphicWeaver: Benchmarking Agentic Planning for Graphic Design Generation
Dayeon Ki | Tianyi Zhou | Marine Carpuat | Gang Wu | Puneet Mathur | Viswanathan Swaminathan
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Dayeon Ki | Tianyi Zhou | Marine Carpuat | Gang Wu | Puneet Mathur | Viswanathan Swaminathan
Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
Vision-language model (VLM)-powered agents are increasingly enabling new forms of automation across various human tasks. While prior work has primarily focused on well-defined problems with explicit goals, the capabilities of agents in creative graphic design, where goals are inherently open-ended and subjective, remain largely underexplored.To bridge this gap, we introduce GraphicWeaver, a planning benchmark for graphic design comprising 1,079 diverse user queries and associated images spanning four design categories.Comprehensive experiments with six models reveal that current VLM-based agents struggle to handle such complex planning tasks, which require taking into account both explicit design constraints specified in queries and implicit commonsense design principles. We attribute these failures to challenges in (1) retrieving appropriate parameters for tool usage, (2) understanding spatial relationships across design components, and (3) coordinating dependencies across agents. We envision GraphicWeaver as a challenging yet valuable testbed for advancing VLM agent planning in creative design contexts.
2025
GUI Agents: A Survey
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Dang Nguyen | Jian Chen | Yu Wang | Gang Wu | Namyong Park | Zhengmian Hu | Hanjia Lyu | Junda Wu | Ryan Aponte | Yu Xia | Xintong Li | Jing Shi | Hongjie Chen | Viet Dac Lai | Zhouhang Xie | Sungchul Kim | Ruiyi Zhang | Tong Yu | Mehrab Tanjim | Nesreen K. Ahmed | Puneet Mathur | Seunghyun Yoon | Lina Yao | Branislav Kveton | Jihyung Kil | Thien Huu Nguyen | Trung Bui | Tianyi Zhou | Ryan A. Rossi | Franck Dernoncourt
Findings of the Association for Computational Linguistics: ACL 2025
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yue Fan | Handong Zhao | Ruiyi Zhang | Yu Shen | Xin Eric Wang | Gang Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Graphical User Interface (GUI) action grounding, mapping language instructions to actionable elements on GUI screens, is important for assisting users in interactive tutorials, task automation, accessibility support, etc. Most recent works of GUI action grounding use large GUI datasets to fine-tune Multimodal Large Language Models (MLLMs). However, the fine-tuning data is inherently limited to specific GUI environments, leading to significant performance degradation in novel environments due to the generalization challenges in the GUI domain. Therefore, we argue that GUI action grounding models should be further aligned with novel environments before deployment to optimize their performance. To address this, we first propose GUI-Bee, an MLLM-based autonomous agent, to collect high-quality, environment-specific data through exploration and then continuously fine-tune GUI grounding models with the collected data. To ensure the GUI action grounding models generalize to various screens within the target novel environment after the continuous fine-tuning, we equip GUI-Bee with a novel Q-value-Incentive In-Context Reinforcement Learning (Q-ICRL) algorithm that optimizes exploration efficiency and exploration data quality. In the experiment, we introduce NovelScreenSpot to test how well the data can help align GUI action grounding models to novel environments. Furthermore, we conduct an ablation study to validate the Q-ICRL method in enhancing the efficiency of GUI-Bee.
Search
Fix author
Co-authors
- Tianyi Zhou 3
- Franck Dernoncourt 2
- Puneet Mathur 2
- Viswanathan Swaminathan 2
- Tong Yu 2
- Ruiyi Zhang 2
- Nesreen K. Ahmed 1
- Wei Ai 1
- Ryan Aponte 1
- Trung Bui 1
- Marine Carpuat 1
- Hongjie Chen 1
- Jian Chen 1
- Xiang Chen 1
- Yue Fan 1
- Zhengmian Hu 1
- Dayeon Ki 1
- Jihyung Kil 1
- Sungchul Kim 1
- Branislav Kveton 1
- Viet Dac Lai 1
- Xintong Li 1
- Hanjia Lyu 1
- Dang Nguyen 1
- Thien Huu Nguyen 1
- Namyong Park 1
- Ryan A. Rossi 1
- Yu Shen 1
- Jing Shi 1
- Mehrab Tanjim 1
- Xin Eric Wang 1
- Yu Wang 1
- Junda Wu 1
- Yu Xia 1
- Chang Xiao 1
- Zhouhang Xie 1
- Paiheng Xu 1
- Lina Yao 1
- Seunghyun Yoon 1
- Handong Zhao 1