Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models

Ilias Stogiannidis; Stavros Vassos; Prodromos Malakasiotis; Ion Androutsopoulos

doi:10.18653/v1/2023.findings-emnlp.1000

Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models

Ilias Stogiannidis, Stavros Vassos, Prodromos Malakasiotis, Ion Androutsopoulos

Abstract

Prompting Large Language Models (LLMs) performs impressively in zero- and few-shot settings. Hence, small and medium-sized enterprises (SMEs) that cannot afford the cost of creating large task-specific training datasets, but also the cost of pretraining their own LLMs, are increasingly turning to third-party services that allow them to prompt LLMs. However, such services currently require a payment per call, which becomes a significant operating expense (OpEx). Furthermore, customer inputs are often very similar over time, hence SMEs end-up prompting LLMs with very similar instances. We propose a framework that allows reducing the calls to LLMs by caching previous LLM responses and using them to train a local inexpensive model on the SME side. The framework includes criteria for deciding when to trust the local model or call the LLM, and a methodology to tune the criteria and measure the tradeoff between performance and cost. For experimental purposes, we instantiate our framework with two LLMs, GPT-3.5 or GPT-4, and two inexpensive students, a k-NN classifier or a Multi-Layer Perceptron, using two common business tasks, intent recognition and sentiment analysis. Experimental results indicate that significant OpEx savings can be obtained with only slightly lower performance.

Anthology ID:: 2023.findings-emnlp.1000
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14999–15008
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.1000
DOI:: 10.18653/v1/2023.findings-emnlp.1000
Bibkey:
Cite (ACL):: Ilias Stogiannidis, Stavros Vassos, Prodromos Malakasiotis, and Ion Androutsopoulos. 2023. Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14999–15008, Singapore. Association for Computational Linguistics.
Cite (Informal):: Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models (Stogiannidis et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.findings-emnlp.1000.pdf
Video:: https://preview.aclanthology.org/jeptaln-2024-ingestion/2023.findings-emnlp.1000.mp4

PDF Search Video