MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models

Ahmad Chamma, Omar El Herraoui, Guokan Shang


Abstract
We introduce MixtureKit, a modular open-source framework for constructing, training, and analyzing Mixture-of-Experts (MoE) models from arbitrary pre-trained or fine-tuned checkpoints. MixtureKit supports three complementary strategies: (i) Traditional MoE, using a single router per transformer block to select experts; (ii) BTX (Branch-Train-Mix), adding routers at user-specified sub-layers for fine-grained token routing; and (iii) BTS (Branch-Train-Stitch), preserving experts intact and introducing lightweight stitch layers for controlled hub–expert information exchange. Given a single configuration dictionary, MixtureKit automatically modifies model configuration, patches decoder and causal LM classes, and exports a unified transformers-compatible checkpoint ready for inference or further fine-tuning. We also provide a visualization interface to inspect token routing, expert weight distributions, and layer-wise contributions. Experiments on multilingual code-switched (Arabic–Latin) data show that BTX models built with MixtureKit can outperform dense baselines across multiple benchmarks. The library is accessible at: https://github.com/MBZUAI-Paris/MixtureKit.
Anthology ID:
2026.acl-demo.15
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Greg Durrett, Ping Jian
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
148–156
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.15/
DOI:
Bibkey:
Cite (ACL):
Ahmad Chamma, Omar El Herraoui, and Guokan Shang. 2026. MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 148–156, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models (Chamma et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-demo.15.pdf