Soham Shah

2025

pdf bib abs
Select-then-Route : Taxonomy guided Routing for LLMs
Soham Shah | Kumar Shridhar
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Recent advances in large language models (LLMs) have boosted performance across a broad spectrum of natural‐language tasks, yet no single model excels uniformly across domains. Sending each query to the most suitable model mitigates this limitation, but deciding among *all* available LLMs for each query is prohibitively expensive. Both the accuracy and the latency can improve if the decision space for the model choice is first narrowed, followed by selecting the suitable model for the given query.We introduce Select-then-Route (StR), a two‐stage framework that first *selects* a small, task‐appropriate pool of LLMs and then *routes* each query within that pool through an adaptive cascade. StR first employs a lightweight, *taxonomy‐guided selector* that maps each query to models proven proficient for its semantic class (e.g., reasoning, code, summarisation). Within the selected pool, a *confidence‐based cascade* begins with the cheapest model and escalates only when a multi‐judge agreement test signals low reliability.Across six public benchmarks of various domains, StR improves the end‐to‐end accuracy from 91.7% (best single model) to 94.3% while reducing inference cost by 4X. Because both the taxonomy and multi-judge evaluation thresholds are tunable, StR exposes a smooth cost–accuracy frontier, enabling users to dial in the trade‐off that best fits their latency and budget constraints.

2023

In everyday life, humans often plan their actions by following step-by-step instructions in the form of goal-oriented scripts. Previous work has exploited language models (LMs) to plan for abstract goals of stereotypical activities (e.g., “make a cake”), but leaves more specific goals with multi-facet constraints understudied (e.g., “make a cake for diabetics”). In this paper, we define the task of constrained language planning for the first time. We propose an over-generate-then-filter approach to improve large language models (LLMs) on this task, and use it to distill a novel constrained language planning dataset, Coscript, which consists of 55,000 scripts. Empirical results demonstrate that our method significantly improves the constrained language planning ability of LLMs, especially on constraint faithfulness. Furthermore, Coscript is demonstrated to be quite effective in endowing smaller LMs with constrained language planning ability.

Co-authors

Yanghua Xiao 1

Deqing Yang 1

Siyu Yuan 1

Venues

acl1
emnlp1

Fix author