Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

Xiao Liu, Jianfeng Lin, Jiawei Zhang


Abstract
The proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking the complexity of real-world multimodal information. This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset aimed at expanding LLMs’ proficiency in multimodal contexts. Developed collaboratively through ChatGPT, MultiAPI consists of 187 diverse API calls and 1,799 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Through comprehensive experiments, our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation. What’s more, we surprisingly notice that auxiliary context can actually impair the performance. An in-depth error analysis paves the way for a new paradigm to address these challenges, suggesting a potential direction for future LLM research.
Anthology ID:
2024.knowllm-1.4
Volume:
Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Sha Li, Manling Li, Michael JQ Zhang, Eunsol Choi, Mor Geva, Peter Hase, Heng Ji
Venues:
KnowLLM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–44
Language:
URL:
https://aclanthology.org/2024.knowllm-1.4
DOI:
Bibkey:
Cite (ACL):
Xiao Liu, Jianfeng Lin, and Jiawei Zhang. 2024. Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark. In Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024), pages 32–44, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark (Liu et al., KnowLLM-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.knowllm-1.4.pdf