TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

Jiangyang He, Shaolin Zhu, Deyi Xiong


Abstract
Mixture-of-Experts large language models (LLMs) scale efficiently through sparse activation, yet their deployment is fundamentally constrained by the large static parameter footprint of experts. Existing compression approaches either remove entire experts, disrupting routing topology and harming performance, or rely on unstructured weight pruning with limited practical efficiency. To address the limitations, we propose TENP, a structured **T**rapezoidal **E**xpert **N**euron **P**runing framework. Using a few samples, we identify and retain important experts, while applying expert neuron pruning (ENP) to less important experts, preserving model parameters in a trapezoidal pattern from shallow to deep layers. When evaluating expert importance, we jointly consider both the magnitude of the expert output and its ability to change the direction of the input vector. For ENP, we measure each neuron’s projected contribution to the expert output to identify and retain important neurons. We conduct extensive experiments on the Qwen and DeepSeek models. Under a routing expert sparsity of 40% and an average of 63.76% activated expert parameters, the DeepSeek model suffers only a 1-point drop in accuracy compared to the full-parameter model. Moreover, it outperforms the full-parameter model by 10% on code generation tasks.
Anthology ID:
2026.findings-acl.1049
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20911–20925
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1049/
DOI:
Bibkey:
Cite (ACL):
Jiangyang He, Shaolin Zhu, and Deyi Xiong. 2026. TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20911–20925, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts (He et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1049.pdf
Checklist:
 2026.findings-acl.1049.checklist.pdf