Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models

Jaeseong Lee; Seung-won Hwang; Wonpyo Park; Mingi Ji

doi:10.18653/v1/2024.emnlp-main.563

Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models

Jaeseong Lee, Seung-won Hwang, Wonpyo Park, Mingi Ji

Abstract

As the scale of language models (LMs) continues to grow, there is a heightened interest in reducing the inference cost associated with these models. Mixture-of-Experts (MoEs) present an efficient alternative to dense models, while the existing methods to convert pretrained dense models to MoEs is limited to ReLU-based models with natural sparsity. This paper introduces G-MoEfication, applicable to arbitrary dense models, where ReLU-based activation sparsity assumptions no longer hold. For generalizations, we encounter the dilemma of needing to zero-out deactivated experts, while also avoiding excessive zeroing-out to retain dense activation information. We publicly release our code and report results conducted with mBERT, SantaCoder-1.1B, Phi-2-2.7B, and Falcon-7B demonstrating the efficacy of our approach in general scenarios: from multitask to multilingual, from fine-tuning to zero-shot evaluation.

Anthology ID:: 2024.emnlp-main.563
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10097–10107
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.563/
DOI:: 10.18653/v1/2024.emnlp-main.563
Bibkey:
Cite (ACL):: Jaeseong Lee, Seung-won Hwang, Wonpyo Park, and Mingi Ji. 2024. Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10097–10107, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models (Lee et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.563.pdf
Software:: 2024.emnlp-main.563.software.zip

PDF Search Software Fix data