ModularMoE: Fast LLM Customization with Parameter-Sharing Mixture-of-Experts for Low-Resource Settings
Jiaxing Liu, Qi Qi, Haifeng Sun, Dunjun Li, Zirui Zhuang, Bo He, Xiang Yang, Cong Liu, Jianxin Liao, Jingyu Wang
Abstract
The massive size of Large Language Models (LLMs) imposes substantial computational and storage burdens, particularly on devices with limited hardware resources. Compared to foundation models, smaller and more specialized models are often more suitable for practical deployment. Existing customization approaches, such as the conventional “prune-then-finetune” paradigm or task-agnostic deployment strategies, either incur excessive computational costs or lead to suboptimal task performance. The recently popular Mixture-of-Experts (MoE) architecture exhibits a strong ability to mitigate inter-task interference, offering a new perspective on model deployment. In this paper, we introduce ModularMoE, a training framework that converts pre-trained LLMs into parameter-sharing MoE models for lightweight deployment. Exploiting the emergent modularity within LLMs, we split the feed-forward layers into multiple disjoint modules. Each expert is then constructed as a combination of such modules, enabling knowledge sharing across experts and thereby improving parameter efficiency within MoEs. Extensive experiments across multiple downstream tasks demonstrate that ModularMoE outperforms other state-of-the-art baselines at the same sparsity level, achieving an average performance improvement of 4.10% to 28.75% while delivering up to 2.71× inference speedup.- Anthology ID:
- 2026.findings-acl.174
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3562–3575
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.174/
- DOI:
- Cite (ACL):
- Jiaxing Liu, Qi Qi, Haifeng Sun, Dunjun Li, Zirui Zhuang, Bo He, Xiang Yang, Cong Liu, Jianxin Liao, and Jingyu Wang. 2026. ModularMoE: Fast LLM Customization with Parameter-Sharing Mixture-of-Experts for Low-Resource Settings. In Findings of the Association for Computational Linguistics: ACL 2026, pages 3562–3575, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- ModularMoE: Fast LLM Customization with Parameter-Sharing Mixture-of-Experts for Low-Resource Settings (Liu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.174.pdf