From Outcome to Process: Optimizing MoE Load Balancing with MCTS

Wenjun Ke; Hengyuan Xu; Ziyu Shang; Yao He; Jiahao Wang; Zijie Xu; Peng Wang; Yuhang Lou; Jiajun Liu

From Outcome to Process: Optimizing MoE Load Balancing with MCTS

Wenjun Ke, Hengyuan Xu, Ziyu Shang, Yao He, Jiahao Wang, Zijie Xu, Peng Wang, Yuhang Lou, Jiajun Liu

Abstract

Mixture of Experts (MoE) dynamically routes inputs to specialized expert networks, enabling large language models to scale capacity with low inference overhead. To further improve MoE’s parameter efficiency in resource-constrained scenarios, LoRA–MoE integrates LoRA for lightweight adaptation while preserving MoE’s specialization. Despite these benefits, the effectiveness of LoRA–MoE still hinges on balanced expert utilization, where certain experts dominate activations while most remain underutilized. Existing balancing strategies focus on constraining the final distribution of expert usage, but overlook the routing decisions made at each layer. As a result, imbalances gradually accumulate across the routing hierarchy. To address this challenge, we propose LayerMoE, a novel three-stage framework that leverages process-level rewards to guide balanced expert routing. Specifically, to overcome the limitation of focusing only on final losses and ignoring intermediate routing, we introduce Monte Carlo Tree Search (MCTS)-based sampling that decomposes outcome-level supervision into layer-wise reward signals, guiding expert choices throughout the routing process. For efficiency, we organize Transformer layers into groups, which constrain the search space of MCTS and keep exploration overhead tractable while retaining the hierarchical structure. Extensive experiments on representative datasets (e.g., ARC, RACE, OBQA) show that applying LayerMoE consistently improves the performance of state-of-the-art LoRA-MoE baselines, yielding an average accuracy gain of 1.39%. Notably, the maximum improvement reaches 2.50%.

Anthology ID:: 2026.findings-acl.1440
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28831–28848
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1440/
DOI:
Bibkey:
Cite (ACL):: Wenjun Ke, Hengyuan Xu, Ziyu Shang, Yao He, Jiahao Wang, Zijie Xu, Peng Wang, Yuhang Lou, and Jiajun Liu. 2026. From Outcome to Process: Optimizing MoE Load Balancing with MCTS. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28831–28848, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: From Outcome to Process: Optimizing MoE Load Balancing with MCTS (Ke et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1440.pdf
Checklist:: 2026.findings-acl.1440.checklist.pdf

PDF Cite Search Checklist Fix data