Abstract
LLM-based agents can greatly extend the abilities of LLMs and thus attract sharply increased studies. An ambitious vision – serving users by manipulating massive API-based tools – has been proposed and explored. However, we find a widely accepted evaluation mechanism for generic agents is still missing. This work aims to fill this gap. We decompose tool use capability into seven aspects and form a thorough evaluation schema. In addition, we design and release an instruction dataset and a toolset – the two sides that the agents bridge between – following the principle of reflecting real-world challenges. Furthermore, we evaluate multiple generic agents. Our findings can inspire future research in improving LLM-based agents and rethink the philosophy of API design.- Anthology ID:
- 2024.findings-emnlp.267
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4649–4662
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.267/
- DOI:
- 10.18653/v1/2024.findings-emnlp.267
- Cite (ACL):
- Bing Liu, Zhou Jianxiang, Dan Meng, and Haonan Lu. 2024. An Evaluation Mechanism of LLM-based Agents on Manipulating APIs. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4649–4662, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- An Evaluation Mechanism of LLM-based Agents on Manipulating APIs (Liu et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.267.pdf