AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction

Hongru Wang; Rui Wang; Boyang Xue; Heming Xia; Jingtao Cao; Zeming Liu; Jeff Z. Pan; Kam-Fai Wong

doi:10.18653/v1/2024.emnlp-main.856

AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction

Hongru Wang, Rui Wang, Boyang Xue, Heming Xia, Jingtao Cao, Zeming Liu, Jeff Z. Pan, Kam-Fai Wong

Abstract

Large Language Models (LLMs) can interact with the real world by connecting with versatile external APIs, resulting in better problem-solving and task automation capabilities. Previous research primarily either focuses on APIs with limited arguments from a single source or overlooks the complex dependency relationship between different APIs. However, it is essential to utilize multiple APIs collaboratively from various sources, especially for complex user instructions. In this paper, we introduce MetaBench, the first benchmark to evaluate LLMs’ ability to plan and execute multiple APIs from various sources in order to complete the user’s task. Specifically, we consider two significant challenges in multiple APIs: 1) graph structures: some APIs can be executed independently while others need to be executed one by one, resulting in graph-like execution order; and 2) permission constraints: which source is authorized to execute the API call. We have experimental results on 9 distinct LLMs; e.g., GPT-4o achieves only a 2.0% success rate at the most complex instruction, revealing that the existing state-of-the-art LLMs still cannot perform well in this situation even with the help of in-context learning and finetuning. Our code and data are publicly available at https://github.com/ruleGreen/AppBench.

Anthology ID:: 2024.emnlp-main.856
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15322–15336
Language:
URL:: https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.856/
DOI:: 10.18653/v1/2024.emnlp-main.856
Bibkey:
Cite (ACL):: Hongru Wang, Rui Wang, Boyang Xue, Heming Xia, Jingtao Cao, Zeming Liu, Jeff Z. Pan, and Kam-Fai Wong. 2024. AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15322–15336, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction (Wang et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/build-pipeline-with-new-library/2024.emnlp-main.856.pdf

PDF Search Fix metadata