LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts

Felipe Rodriguez, Marcelo Mendoza


Abstract
Mixture of Experts (MoEs) have emerged as strong alternatives to traditional transformers, offering significant advantages in terms of training and inference efficiency. At the core of this architecture lies the router, responsible for selecting which experts are activated for each token. However, despite these advances, routing mechanisms continue to face stability challenges that the basic architecture fails to fully address. One such issue is Myopic Routing, where each token determines its route independently, without considering the routing decisions made for other tokens. To address this limitation, the LogitAttention mechanism is introduced—a variant of traditional attention—and, building upon it, the LogitRouter, a novel routing architecture that incorporates contextual information about the routing of other tokens. Due to budget constraints, a set of simple experiments is designed to obtain preliminary evidence of performance trends. These experiments are empirically validated on established benchmarks such as BoolQ, MMLU, and ARC. Finally, the work concludes with an in-depth discussion of architectural variants, applicability, limitations, and future directions, which aims to support continued research in this area.
Anthology ID:
2025.inlg-main.30
Volume:
Proceedings of the 18th International Natural Language Generation Conference
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
499–510
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.30/
DOI:
Bibkey:
Cite (ACL):
Felipe Rodriguez and Marcelo Mendoza. 2025. LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts. In Proceedings of the 18th International Natural Language Generation Conference, pages 499–510, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts (Rodriguez & Mendoza, INLG 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.inlg-main.30.pdf