DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

Joachim Daiber, Victor Maricato, Ayan Sinha, Andrew Rabinovich


Abstract
We introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.
Anthology ID:
2025.emnlp-industry.154
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2221–2233
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.154/
DOI:
Bibkey:
Cite (ACL):
Joachim Daiber, Victor Maricato, Ayan Sinha, and Andrew Rabinovich. 2025. DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2221–2233, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications (Daiber et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.154.pdf