Daiyi Peng
2025
MoLA: MoE LoRA with Layer-wise Expert Allocation
Chongyang Gao
|
Kezhen Chen
|
Jinmeng Rao
|
Ruibo Liu
|
Baochen Sun
|
Yawen Zhang
|
Daiyi Peng
|
Xiaoyuan Guo
|
Vs Subrahmanian
Findings of the Association for Computational Linguistics: NAACL 2025
Recent efforts to integrate low-rank adaptation (LoRA) with the Mixture-of-Experts (MoE) have managed to achieve performance comparable to full-parameter fine-tuning by tuning much fewer parameters. Despite promising results, research on improving the efficiency and expert analysis of LoRA with MoE is still in its early stages. Recent studies have shown that experts in the MoE architecture have different strengths and also exhibit some redundancy. Does this statement also apply to parameter-efficient MoE? In this paper, we introduce a novel parameter-efficient MoE method, MoE-LoRA with Layer-wise Expert Allocation (MoLA) for Transformer-based models, where each model layer uses a varying number of LoRA experts. We investigate several architectures with varying layer-wise expert configurations. Experiments on six well-known NLP and commonsense QA benchmarks demonstrate that MoLA achieves equal or superior performance compared to all baselines on top of both LLAMA-2, Mistral, and Gemma. We find that allocating more LoRA experts to middle layers further enhances the effectiveness of models with a certain number of experts in total. The redundancy of the experts is more obvious in the lower layers. With much fewer parameters, this allocation strategy outperforms the setting with the same number of experts in every layer. This work can be widely used as a plug-and-play parameter-efficient tuning approach for various applications. The code has been made available at https://github.com/GCYZSL/MoLA.
Search
Fix data
Co-authors
- Kezhen Chen 1
- Chongyang Gao 1
- Xiaoyuan Guo 1
- Ruibo Liu 1
- Jinmeng Rao 1
- show all...