Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models

Gyunyeop Kim, Sangwoo Kang


Abstract
The performance of MoE-based LLMs depends on the router’s ability to select suitable experts; however, the router is typically not explicitly supervised to acquire this routing ability. We propose Exploration-Driven Reinforcement Learning (ERL), which explicitly optimizes the router by exploration of alternative routing paths. For every input, ERL evaluates by (i) the original routing path and (ii) paths in which an 𝛼-fraction of routing decisions is randomly perturbed, and treats their performance gap as an advantage signal in a reinforcement learning. Moreover, MoE-ERLwPL mitigates the risk of performance collapse caused by routing reinforcement learning–induced expert over-specialization by intentionally enforcing overlap in experts’ knowledge. Without adding parameters or external reward models, our method improves summarization (SAMSum, XSUM), question answering (SQuAD), and language modeling (WikiText-2), and raises routing quality, delivering up to 8.9 × higher MRR than baselines over 100 perturbed routing paths. Code is available at our github.
Anthology ID:
2025.findings-emnlp.1282
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23592–23605
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1282/
DOI:
10.18653/v1/2025.findings-emnlp.1282
Bibkey:
Cite (ACL):
Gyunyeop Kim and Sangwoo Kang. 2025. Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23592–23605, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Exploration-Driven Reinforcement Learning for Expert Routing Improvement in Mixture-of-Experts Language Models (Kim & Kang, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1282.pdf
Checklist:
 2025.findings-emnlp.1282.checklist.pdf