Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao
Abstract
In this work, we address the memory overhead of deploying Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs). While MoE layers improve LLM performance without increasing inference costs, the ever-growing number of experts inflates memory requirements, hindering practical deployment. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model’s parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.- Anthology ID:
- 2025.findings-acl.4
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venues:
- Findings | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 86–102
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.4/
- DOI:
- Cite (ACL):
- Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. 2025. Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts. In Findings of the Association for Computational Linguistics: ACL 2025, pages 86–102, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts (Zhang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.4.pdf