@inproceedings{ross-etal-2025-when2call,
    title = "{W}hen2{C}all: When (not) to Call Tools",
    author = "Ross, Hayley  and
      Mahabaleshwarkar, Ameya Sunil  and
      Suhara, Yoshi",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.naacl-long.174/",
    doi = "10.18653/v1/2025.naacl-long.174",
    pages = "3391--3409",
    ISBN = "979-8-89176-189-6",
    abstract = "Leveraging external tools is a key feature for modern Language Models (LMs) to expand their capabilities and integrate them into existing systems. However, existing benchmarks primarily focus on the accuracy of tool calling{---}whether the correct tool is called with the correct parameters{---}and less on evaluating when LMs should (not) call tools. We develop a new benchmark, When2Call, which evaluates tool-calling decision-making: when to generate a tool call, when to ask follow-up questions and when to admit the question can{'}t be answered with the tools provided. We find that state-of-the-art tool-calling LMs show significant room for improvement on When2Call, indicating the importance of this benchmark. We also develop a training set for When2Call and leverage the multiple-choice nature of the benchmark to develop a preference optimization training regime, which shows considerably more improvement than traditional fine-tuning. We release the benchmark and training data as well as evaluation scripts."
}Markdown (Informal)
[When2Call: When (not) to Call Tools](https://preview.aclanthology.org/ingest-emnlp/2025.naacl-long.174/) (Ross et al., NAACL 2025)
ACL
- Hayley Ross, Ameya Sunil Mahabaleshwarkar, and Yoshi Suhara. 2025. When2Call: When (not) to Call Tools. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3391–3409, Albuquerque, New Mexico. Association for Computational Linguistics.