Yuting Ding
2026
H-MAS: Hierarchical Multi-Agent Scheduling for Multi-Tenant LLM Serving
Yuhan Liu | Cong Xu | Qi Jia | Yihua Wang | Feiyu Chen | Liang Jin | Lu Liu | Yaqian Zhao | Yuting Ding | Xiang Li
Findings of the Association for Computational Linguistics: ACL 2026
Yuhan Liu | Cong Xu | Qi Jia | Yihua Wang | Feiyu Chen | Liang Jin | Lu Liu | Yaqian Zhao | Yuting Ding | Xiang Li
Findings of the Association for Computational Linguistics: ACL 2026
Multi-tenant Model-as-a-Service (MaaS) LLM serving must maintain stringent quality of service (QoS) despite heterogeneous requests competing for constrained GPU resources. In practice, MaaS workloads exhibit non-stationarity across multiple time scales, including request bursts, request-composition drift, and persistent workload shifts. Because workloads change across multiple time scales, existing request schedulers often rely on a single fixed policy (e.g., First-Come-First-Served, FCFS) that remains unchanged at runtime, which can lead to unstable QoS. We propose H-MAS, a hierarchical multi-agent scheduler that operates in a layered closed loop: a perception/prediction layer infers lightweight request attributes and cost signals; a feedback layer summarizes runtime metrics into short- and long-horizon QoS states; a hierarchical control layer updates the active scheduling policy over longer horizons and tunes execution parameters over shorter horizons; and an execution layer applies these decisions during inference. Experiments with load scaling and Azure-trace replays show that H-MAS achieves 1.2×–3.0× higher Goodput than SGLang and vLLM, and maintains more stable QoS under workload drift, diverse request lengths and heterogeneous SLO targets.
2020
PharmMT: A Neural Machine Translation Approach to Simplify Prescription Directions
Jiazhao Li | Corey Lester | Xinyan Zhao | Yuting Ding | Yun Jiang | V.G.Vinod Vydiswaran
Findings of the Association for Computational Linguistics: EMNLP 2020
Jiazhao Li | Corey Lester | Xinyan Zhao | Yuting Ding | Yun Jiang | V.G.Vinod Vydiswaran
Findings of the Association for Computational Linguistics: EMNLP 2020
The language used by physicians and health professionals in prescription directions includes medical jargon and implicit directives and causes much confusion among patients. Human intervention to simplify the language at the pharmacies may introduce additional errors that can lead to potentially severe health outcomes. We propose a novel machine translation-based approach, PharmMT, to automatically and reliably simplify prescription directions into patient-friendly language, thereby significantly reducing pharmacist workload. We evaluate the proposed approach over a dataset consisting of over 530K prescriptions obtained from a large mail-order pharmacy. The end-to-end system achieves a BLEU score of 60.27 against the reference directions generated by pharmacists, a 39.6% relative improvement over the rule-based normalization. Pharmacists judged 94.3% of the simplified directions as usable as-is or with minimal changes. This work demonstrates the feasibility of a machine translation-based tool for simplifying prescription directions in real-life.