Accelerating Dense LLMs via L0-regularized Mixture-of-Experts

Zhenyu Zhang, JiuDong Yang, Taozhaowen Taozhaowen, Meng Chen


Abstract
Large language models (LLMs) achieve strong performance but suffer from slow and costly inference. Existing acceleration methods often lead to noticeable performance degradation, while Mixture-of-Experts (MoE) models require extensive computational resources. In this paper, we propose L0-MoE, a lightweight MoE approach using L0-regularization to accelerate dense LLMs nearly without performance loss. Our method introduces a cluster confusion matrix for domain-aware dataset curation and applies dynamic batching for efficient training. Experiments show that L0-MoE achieves up to 2.5x speedup over dense models while maintaining competitive performance, outperforming existing LLM acceleration baselines.
Anthology ID:
2025.acl-short.39
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
504–513
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.acl-short.39/
DOI:
Bibkey:
Cite (ACL):
Zhenyu Zhang, JiuDong Yang, Taozhaowen Taozhaowen, and Meng Chen. 2025. Accelerating Dense LLMs via L0-regularized Mixture-of-Experts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 504–513, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Accelerating Dense LLMs via L0-regularized Mixture-of-Experts (Zhang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.acl-short.39.pdf