Chenyang Liao
2025
AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments
Zhiheng Xi
|
Yiwen Ding
|
Wenxiang Chen
|
Boyang Hong
|
Honglin Guo
|
Junzhe Wang
|
Xin Guo
|
Dingwen Yang
|
Chenyang Liao
|
Wei He
|
Songyang Gao
|
Lu Chen
|
Rui Zheng
|
Yicheng Zou
|
Tao Gui
|
Qi Zhang
|
Xipeng Qiu
|
Xuanjing Huang
|
Zuxuan Wu
|
Yu-Gang Jiang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have emerged as a promising foundation to build generally-capable agents (LLM-based agents) that can handle multi-turn decision-making tasks across various environments. However, the community lacks a unified interactive framework that covers diverse environments for comprehensive evaluation of agents, and enables exploration and learning for their self-improvement. To address this, we propose AgentGym, a framework featuring 7 real-world scenarios, 14 environments, and 89 tasks for unified, real-time, and concurrent agent interaction. We construct expanded instruction set, high-quality trajectories, and comprehensive benchmarking suite for developing LLM-based agents. Moreover, AgentGym supports interactive exploration and learning for agents through multi-turn interactions and real-time feedback. Based on AgentGym, we take the initial step to develop LLM-based agents that can handle diverse tasks via methods like self-improvement or reinforcement learning. Experimental results show that the trained agents can achieve results comparable to commercial models. We hope our work can help the community develop more advanced LLM-based agents. We release the code, dataset, benchmark, and checkpoints at https://agentgym.github.io/.
Search
Fix author
Co-authors
- Wenxiang Chen 1
- Lu Chen 1
- Yiwen Ding 1
- Songyang Gao 1
- Tao Gui 1
- show all...
Venues
- acl1