2025
pdf
bib
abs
Multi-Programming Language Sandbox for LLMs
Shihan Dou
|
Jiazheng Zhang
|
Jianxiang Zang
|
Yunbo Tao
|
Weikang Zhou
|
Haoxiang Jia
|
Shichun Liu
|
Yuming Yang
|
Shenxi Wu
|
Zhiheng Xi
|
Muling Wu
|
Rui Zheng
|
Changze Lv
|
Limao Xiong
|
Shaoqing Zhang
|
Lin Zhang
|
Wenyu Zhan
|
Rongxiang Weng
|
Jingang Wang
|
Xunliang Cai
|
Yueming Wu
|
Ming Wen
|
Yixin Cao
|
Tao Gui
|
Xipeng Qiu
|
Qi Zhang
|
Xuanjing Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs). It can automatically identify the programming language of the code, compiling and executing it within an isolated sub-sandbox to ensure safety and stability. In addition, MPLSandbox integrates both traditional and LLM-based code analysis tools, providing a comprehensive analysis of generated code. It also can be effortlessly integrated into the training and deployment of LLMs to improve the quality and correctness of generated code. It also helps researchers streamline their workflows for various LLM-based code-related tasks, reducing the development cost. To validate the effectiveness of MPLSandbox, we conduct extensive experiments by integrating it into several training and deployment scenarios, and employing it to optimize workflows for a wide range of downstream code tasks. Our goal is to enhance researcher productivity on LLM-based code tasks by simplifying and automating workflows through delegation to MPLSandbox.
2024
pdf
bib
abs
Dynamic Planning for LLM-based Graphical User Interface Automation
Shaoqing Zhang
|
Zhuosheng Zhang
|
Kehai Chen
|
Xinbei Ma
|
Muyun Yang
|
Tiejun Zhao
|
Min Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, particularly in intriguing applications within smartphone graphical user interfaces (GUIs). When presented with a task goal, these agents typically emulate human actions within a GUI environment until the task is completed. However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps. Specifically, given the dynamic nature of environmental GUIs following action execution, it is crucial to dynamically adapt plans based on environmental feedback and action history.We show that the widely-used ReAct approach fails due to the excessively long historical dialogues. To address this challenge, we propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents.D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history. Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7% (34.66% → 47.36%) in accuracy. The analysis highlights the generality of dynamic planning in different backbone LLMs, as well as the benefits in mitigating hallucinations and adapting to unseen tasks. Code is available at https://github.com/sqzhang-lazy/D-PoT.