Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao


Abstract
In this work, we address the memory overhead of deploying Mixture-of-Experts (MoE) architectures in Large Language Models (LLMs). While MoE layers improve LLM performance without increasing inference costs, the ever-growing number of experts inflates memory requirements, hindering practical deployment. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model’s parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
Anthology ID:
2025.findings-acl.4
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–102
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.4/
DOI:
Bibkey:
Cite (ACL):
Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, and Jianfeng Gao. 2025. Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts. In Findings of the Association for Computational Linguistics: ACL 2025, pages 86–102, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.4.pdf