MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models

Ahmad Chamma; Omar El Herraoui; Guokan Shang

MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models

Ahmad Chamma, Omar El Herraoui, Guokan Shang

Abstract

We introduce MixtureKit, a modular open-source framework for constructing, training, and analyzing Mixture-of-Experts (MoE) models from arbitrary pre-trained or fine-tuned checkpoints. MixtureKit supports three complementary strategies: (i) Traditional MoE, using a single router per transformer block to select experts; (ii) BTX (Branch-Train-Mix), adding routers at user-specified sub-layers for fine-grained token routing; and (iii) BTS (Branch-Train-Stitch), preserving experts intact and introducing lightweight stitch layers for controlled hub–expert information exchange. Given a single configuration dictionary, MixtureKit automatically modifies model configuration, patches decoder and causal LM classes, and exports a unified transformers-compatible checkpoint ready for inference or further fine-tuning. We also provide a visualization interface to inspect token routing, expert weight distributions, and layer-wise contributions. Experiments on multilingual code-switched (Arabic–Latin) data show that BTX models built with MixtureKit can outperform dense baselines across multiple benchmarks. The library is accessible at: https://github.com/MBZUAI-Paris/MixtureKit.

Anthology ID:: 2026.acl-demo.15
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Greg Durrett, Ping Jian
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 148–156
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.15/
DOI:
Bibkey:
Cite (ACL):: Ahmad Chamma, Omar El Herraoui, and Guokan Shang. 2026. MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 148–156, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models (Chamma et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.15.pdf

PDF Cite Search Fix data