CoSMoEs: Compact Sparse Mixture of Experts
Patrick Huber, Akshat Shrivastava, Ernie Chang, Chinnadhurai Sankar, Ahmed A Aly, Adithya Sagar
Abstract
Sparse Mixture of Expert (MoE) models are widely used foundation architectures at large scale, yet remain under-explored at smaller sizes. In this work, we introduce Compact Sparse Mixture of Experts (CoSMoEs) for on-device inference, addressing three key challenges: Quality, Memory, and Latency. On the quality front, we conduct a fair evaluation (removing confounding factors) and show that MoE architectures outperform dense models at on-device scale. We further propose weight-decomposed experts, which improve MoE performance beyond the standard formulation. On the memory and latency front, we address the prohibitively large parameter count of MoE models by improving expert offloading efficiency through a novel training-time loss, reducing inference latency for on-device deployment- Anthology ID:
- 2026.alvr-main.4
- Volume:
- Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Qianqi Yan, Syrielle Montariol, Yue Fan, Jing Gu, Jiayi Pan, Manling Li, Parisa Kordjamshidi, Alane Suhr, Xin Eric Wang
- Venues:
- ALVR | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 46–56
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.4/
- DOI:
- Cite (ACL):
- Patrick Huber, Akshat Shrivastava, Ernie Chang, Chinnadhurai Sankar, Ahmed A Aly, and Adithya Sagar. 2026. CoSMoEs: Compact Sparse Mixture of Experts. In Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), pages 46–56, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- CoSMoEs: Compact Sparse Mixture of Experts (Huber et al., ALVR 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.alvr-main.4.pdf