@inproceedings{shen-etal-2024-smartcal,
    title = "{SMARTCAL}: An Approach to Self-Aware Tool-Use Evaluation and Calibration",
    author = "Shen, Yuanhao  and
      Zhu, Xiaodan  and
      Chen, Lei",
    editor = "Dernoncourt, Franck  and
      Preo{\c{t}}iuc-Pietro, Daniel  and
      Shimorina, Anastasia",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    month = nov,
    year = "2024",
    address = "Miami, Florida, US",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.emnlp-industry.59/",
    doi = "10.18653/v1/2024.emnlp-industry.59",
    pages = "774--789",
    abstract = "The tool-use ability of Large Language Models (LLMs) has a profound impact on a wide range of applications. However, LLMs' self-awareness and self-control capability in appropriately using tools remains understudied. The problem is consequential as it alarms a potential risk of degraded performance and poses a threat to trustworthiness on the models. In this paper, we conduct a study on a family of state-of-the-art LLMs on three datasets with two mainstream tool-use frameworks. Our study reveals the tool-abuse behavior of LLMs, a tendency for models to misuse tools along with models' frequent overconfidence in tool choice. We also find that this is a common issue regardless of model capability. Accordingly, we propose a novel framework, SMARTCAL, to mitigate the observed issues, and our results show an average 8.6 percent increase in the QA performance in three testing datasets and 21.6 percent lower Expected Calibration Error (ECE) than existing methods."
}Markdown (Informal)
[SMARTCAL: An Approach to Self-Aware Tool-Use Evaluation and Calibration](https://preview.aclanthology.org/ingest-emnlp/2024.emnlp-industry.59/) (Shen et al., EMNLP 2024)
ACL