On the Robustness of Agentic Function Calling

Ella Rabinovich, Ateret Anaby Tavor


Abstract
Large Language Models (LLMs) are increasingly acting as autonomous agents, with function calling (FC) capabilities enabling them to invoke specific tools for tasks. While prior research has primarily focused on improving FC accuracy, little attention has been given to the robustness of these agents to perturbations in their input. We introduce a benchmark assessing FC robustness in two key areas: resilience to naturalistic query variations, and stability in function calling when the toolkit expands with semantically related tools. Evaluating best-performing FC models on a carefully expanded subset of the Berkeley function calling leaderboard (BFCL), we identify critical weaknesses in existing evaluation methodologies, and highlight areas for improvement in real-world agentic deployments.
Anthology ID:
2025.trustnlp-main.20
Volume:
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Trista Cao, Anubrata Das, Tharindu Kumarage, Yixin Wan, Satyapriya Krishna, Ninareh Mehrabi, Jwala Dhamala, Anil Ramakrishna, Aram Galystan, Anoop Kumar, Rahul Gupta, Kai-Wei Chang
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
298–304
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.20/
DOI:
Bibkey:
Cite (ACL):
Ella Rabinovich and Ateret Anaby Tavor. 2025. On the Robustness of Agentic Function Calling. In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 298–304, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
On the Robustness of Agentic Function Calling (Rabinovich & Anaby Tavor, TrustNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.20.pdf