Che Wang


2026

Large Language Models (LLMs) exhibit remarkable capabilities, but no single model optimally balances serving quality and deployment cost across diverse tasks. Multi-LLM systems address this challenge through intelligent routing mechanisms that dynamically allocate queries to the most appropriate model. However, existing routing methods suffer from two fundamental limitations: (i) dependence on extensive full-response datasets for training, and (ii) poor scalability when incorporating new models, typically necessitating retraining from scratch. In this paper, we propose SemiRouter, a novel LLM routing framework designed for data-sparse and evolving model environments. Our approach combines a data-efficient training methodology with an adaptive architecture that enables seamless integration of new models under limited supervision. As an extension, we also consider energy footprint as a potential deployment cost in our routing decision. Empirical evaluations demonstrate that our method improves data efficiency, adaptability, and routing accuracy compared to existing approaches, providing a scalable solution for dynamic multi-LLM deployment.