Infinity-MoE: Generalizing Mixture of Experts to Infinite Experts

Shota Takashiro; Takeshi Kojima; Shohei Taniguchi; Yusuke Iwasawa; Yutaka Matsuo

Infinity-MoE: Generalizing Mixture of Experts to Infinite Experts

Shota Takashiro, Takeshi Kojima, Shohei Taniguchi, Yusuke Iwasawa, Yutaka Matsuo

Abstract

The Mixture of Experts (MoE) selects a few feed-forward networks (FFNs) per token, achieving an effective trade-off between computational cost and performance. In conventional MoE, each expert is treated as entirely independent, and experts are combined in a discrete space. As a result, when the number of experts increases, it becomes difficult to train each expert effectively. To stabilize training while increasing the number of experts, we propose ∞-MoE that selects a portion of the parameters of large FFNs based on continuous values sampled for each token. By considering experts in a continuous space, this approach allows for an infinite number of experts while maintaining computational efficiency. Experiments show that a GPT-2 Small-based ∞-MoE model, with 129M active and 186M total parameters, achieves comparable performance to a dense GPT-2 Medium with 350M parameters. Adjusting the number of sampled experts at inference time allows for a flexible trade-off between accuracy and speed, with an improvement of up to 2.5% in accuracy over conventional MoE.

Anthology ID:: 2026.eacl-short.33
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 448–456
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.33/
DOI:
Bibkey:
Cite (ACL):: Shota Takashiro, Takeshi Kojima, Shohei Taniguchi, Yusuke Iwasawa, and Yutaka Matsuo. 2026. Infinity-MoE: Generalizing Mixture of Experts to Infinite Experts. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 448–456, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Infinity-MoE: Generalizing Mixture of Experts to Infinite Experts (Takashiro et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.33.pdf
Checklist:: 2026.eacl-short.33.checklist.pdf

PDF Cite Search Checklist Fix data