An Evaluation Mechanism of LLM-based Agents on Manipulating APIs

Bing Liu; Zhou Jianxiang; Dan Meng; Haonan Lu

doi:10.18653/v1/2024.findings-emnlp.267

An Evaluation Mechanism of LLM-based Agents on Manipulating APIs

Bing Liu, Zhou Jianxiang, Dan Meng, Haonan Lu

Abstract

LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision – serving users by manipulating massive API-based tools – has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset – the two sides that the agents bridge between – following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.

Anthology ID:: 2024.findings-emnlp.267
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4649–4662
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.267/
DOI:: 10.18653/v1/2024.findings-emnlp.267
Bibkey:
Cite (ACL):: Bing Liu, Zhou Jianxiang, Dan Meng, and Haonan Lu. 2024. An Evaluation Mechanism of LLM-based Agents on Manipulating APIs. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4649–4662, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: An Evaluation Mechanism of LLM-based Agents on Manipulating APIs (Liu et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.267.pdf

PDF Cite Search Fix data