Kun Zhang
Other people with similar names: Kun Zhang, Kun Zhang (Inria Saclay-Île-de-France), Kun Zhang (University of Chinese Academy of Sciences), Kun Zhang (University of Science and Technology of China)
Unverified author pages with similar names: Kun Zhang
2025
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning
Dacao Zhang | Kun Zhang | Shimao Chu | Le Wu | Xin Li | Si Wei
Findings of the Association for Computational Linguistics: ACL 2025
Dacao Zhang | Kun Zhang | Shimao Chu | Le Wu | Xin Li | Si Wei
Findings of the Association for Computational Linguistics: ACL 2025
With the rapid development of Large Language Models (LLMs), Parameter-Efficient Fine-Tuning (PEFT) methods have gained significant attention, which aims to achieve efficient fine-tuning of LLMs with fewer parameters. As a representative PEFT method, Low-Rank Adaptation (LoRA) introduces low-rank matrices to approximate the incremental tuning parameters and achieves impressive performance over multiple scenarios. After that, plenty of improvements have been proposed for further improvement. However, these methods either focus on single-task scenarios or separately train multiple LoRA modules for multi-task scenarios, limiting the efficiency and effectiveness of LoRA in multi-task scenarios. To better adapt to multi-task fine-tuning, in this paper, we propose a novel Mixture of Low-Rank Experts (MoRE) for multi-task PEFT. Specifically, instead of using an individual LoRA for each task, we align different ranks of LoRA module with different tasks, which we named low-rank experts. Moreover, we design a novel adaptive rank selector to select the appropriate expert for each task. By jointly training low-rank experts, MoRE can enhance the adaptability and efficiency of LoRA in multi-task scenarios. Finally, we conduct extensive experiments over multiple multi-task benchmarks along with different LLMs to verify model performance. Experimental results demonstrate that compared to traditional LoRA and its variants, MoRE significantly improves the performance of LLMs in multi-task scenarios and incurs no additional inference cost. We also release the model and code to facilitate the community.
EDGE: Enhanced Debiased Gradient Extraction for Robust Fine-tuning
Jinglong Li | Kun Zhang | Chenyu Zou | Wei Shi | Xin Li | Si Wei
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Jinglong Li | Kun Zhang | Chenyu Zou | Wei Shi | Xin Li | Si Wei
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
"Recent advances in large-scale pre-training have substantially enhanced the robustness and generalization capabilities of foundation models (e.g., Qwen3 and Llama-4). However, when fine-tuning them on downstream tasks, these models often latch onto dataset-specific biases, learning spurious correlations tied to easy-to-learn but non-robust features. This undermines their performance under distribution shifts, despite strong in-distribution (ID) accuracy. Existing fine-tuning methods, including full-parameter and parameter-efficient techniques, primarily optimize for ID performance and largely overlook out-of-distribution (OOD) robustness. Meanwhile, debiasing has been explored in full fine-tuning, while debiasing strategies on Parameter-Efficient Fine-Tuning (PEFT) remain underexplored. To this end, in this paper, we propose Enhanced Debiased Gradient Extraction (EDGE), a lightweight gradient projection-based method that explicitly suppresses bias-amplifying updates during fine-tuning process. EDGE is a model-agnostic, and plug-and-play debiasing method that operates without relying on predefined bias types or labels.It seamlessly integrates with both full and parameter-efficient fine-tuning, and generalizes acrossNLP and vision tasks. Experiments on synthetic and real-world benchmarks demonstrate thatEDGE effectively reduces bias and consistently improves OOD generalization, offering a unified and practical framework for robust adaptation under dataset bias."