Generalising LLM Routing using Past Performance Retrieval: A Few-Shot Router is Sufficient

Clovis Varangot-Reille, Christophe Bouvard, Antoine Gourru


Abstract
We study model routing for Large Language Model (LLM)-based systems. A model, called the router, dynamically chooses which LLM should handle a given input/query. We challenge the assumption that complex routers are necessary for generalising to new candidate LLMs. We introduce ContextualRouter, a simple meta-evaluation framework that predicts per-model performance for new queries by retrieving similar past queries and reweighting model scores with lightweight attention. During inference, the router balances estimated performance and cost by adjusting a tunable cost penalty parameter. This allows the router to adapt dynamically to the addition or removal of LLMs without the need for retraining. Across five routing benchmarks (SPROUT, RouterBench, LiveBench, BigGenBench, and EmbedLLM), ContextualRouter matches the quality–cost trade-offs of other generalisable routers. Surprisingly, a simpler non-parametric baseline, k-nearest-neighbour averaging, performs comparably or better, achieving strong performance estimation, high NDCG, and substantial cost savings. Retrieval-based routers remain robust to k, embedding size, data sparsity, retrieval degradation, and generalise to unseen queries and models with as little as 1% historical data. These results suggest that effective retrieval alone enables generalisable LLM routing.
Anthology ID:
2026.eacl-srw.22
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
304–319
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.22/
DOI:
Bibkey:
Cite (ACL):
Clovis Varangot-Reille, Christophe Bouvard, and Antoine Gourru. 2026. Generalising LLM Routing using Past Performance Retrieval: A Few-Shot Router is Sufficient. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 304–319, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Generalising LLM Routing using Past Performance Retrieval: A Few-Shot Router is Sufficient (Varangot-Reille et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.22.pdf