Wenhui Zhang
2026
LLM-as-Scheduler: Agentic Workflow Dynamic Scheduling
Dawei Xiang | Kexin Chu | Wenyan Xu | Wenhui Zhang | Wei Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dawei Xiang | Kexin Chu | Wenyan Xu | Wenhui Zhang | Wei Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As large language models (LLMs) improve, many applications are moving from a single LLM call to multi-agent systems. These systems often rely on either hand-designed or automatically optimized workflows with multiple verification and testing steps. While those extra steps can improve accuracy, they also increase latency and token costs. In practice, many queries do not need such heavy processing and can be handled well by a single strong agent.To address this inefficiency, we propose LLM-as-Scheduler (LAS), a system that dynamically chooses the right workflow for each query. LAS uses a two-stage cascade: first, a lightweight gate quickly evaluates each agent’s output; then, an LLM-based scheduler uses query features and gate signals to make more detailed routing decisions. Experiments show that LAS cuts token usage by 43% and reduces end-to-end latency by more than 36%, while causing at most a 1.4 percentage-point drop in accuracy compared with a strong fixed workflow.