DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications
Joachim Daiber, Victor Maricato, Ayan Sinha, Andrew Rabinovich
Abstract
We introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.- Anthology ID:
- 2025.emnlp-industry.154
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou (China)
- Editors:
- Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2221–2233
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.154/
- DOI:
- Cite (ACL):
- Joachim Daiber, Victor Maricato, Ayan Sinha, and Andrew Rabinovich. 2025. DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2221–2233, Suzhou (China). Association for Computational Linguistics.
- Cite (Informal):
- DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications (Daiber et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.154.pdf