Shengji Tang
2026
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shengji Tang | Jianjian Cao | Weihao Lin | Jiale Hong | Bo Zhang | Shuyue Hu | Lei Bai | Tao Chen | Wanli Ouyang | Peng Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration–Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(**+5.36%**) and GPT-o3-mini(**+5.28%**) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (**+2.86%**), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.
LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing
Hao Li | Yiqun Zhang | Zhaoyan Guo | Chenxu Wang | Shengji Tang | Qiaosheng Zhang | Yang Chen | Biqing Qi | Peng Ye | Lei Bai | Zhen Wang | Shuyue Hu
Findings of the Association for Computational Linguistics: ACL 2026
Hao Li | Yiqun Zhang | Zhaoyan Guo | Chenxu Wang | Shengji Tang | Qiaosheng Zhang | Yang Chen | Biqing Qi | Peng Ye | Lei Bai | Zhen Wang | Shuyue Hu
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It comprises over 400K instances from 21 datasets and 33 models. Moreover, it provides comprehensive metrics for both performance-oriented and performance-cost trade-off routing, and integrates 10 representative routing baselines. Using LLMRouterBench, we systematically re-evaluate the field. While confirming strong model complementarity—the central premise of LLM routing—we find that many routing methods exhibit similar performance under unified evaluation, and several recent approaches, including commercial routers, fail to reliably outperform a simple baseline. Meanwhile, a substantial gap remains to the Oracle, driven primarily by persistent model-recall failures. We further show that backbone embedding models have limited impact, that larger ensembles exhibit diminishing returns compared to careful model curation, and that the benchmark also enables latency-aware analysis. All code and data are available at https://github.com/ynulihao/LLMRouterBench.