Donghong Han


2025

pdf bib
SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture
Jiayi Han | Liang Du | Hongwei Du | Xiangguo Zhou | Yiwen Wu | Yuanfang Zhang | Weibo Zheng | Donghong Han
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Despite the recent efforts from the NLP community, balancing the training budget, downstream performance, and general capabilities of large language models (LLM) remains a challenge in many applications. Training the entire model for downstream tasks is expensive, and could easily result in catastrophic forgetting. Parameter-efficient fine-tuning (PEFT) could reduce the training cost, but it still suffers from forgetting, and limits the learning on the downstream tasks. To address the aforementioned issues, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM). SLIM allows dynamic routing between LoRA adapters and identity layers, thus enabling the bypass of LoRA adapters to suppress forgetting of general capacity. We adopt weight yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also convert the mixture of LoRA adapters to the model merging formulation and introduce dynamic merging with its fast implementation for LoRA adapters to keep the general capabilities. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting. We plan to open-source the code upon publication.