Yandu Sun
2026
PASs-MoE: Mitigating Misaligned Co-drift among Router and Experts via Pathway Activation Subspaces for Continual Learning
ZhiYan Hou | Haiyun Guo | Haokai Ma | Yandu Sun | Yonghui Yang | Jinqiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
ZhiYan Hou | Haiyun Guo | Haokai Ma | Yandu Sun | Yonghui Yang | Jinqiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Continual instruction tuning (CIT) requires multimodal large language models (MLLMs) to adapt to a stream of tasks without forgetting prior capabilities. A common strategy is to isolate updates by routing inputs to different LoRA experts. However, existing LoRA-based Mixture-of-Experts (MoE) methods often jointly update the router and experts in an indiscriminate way, causing the router’s preferences to co-drift with experts’ adaptation pathways and gradually deviate from early-stage input–expert specialization. We term this as ***Misaligned Co-drift***, which blurs expert responsibilities and exacerbates forgetting. To address this, we introduce the ***pathway activation subspace (PASs)***, a LoRA-induced subspace that reflects which low-rank pathway directions an input activates in each expert, providing a capability-aligned coordinate system for routing and preservation. Based on PASs, we propose a fixed-capacity PASs-based MoE–LoRA method with two components: PAS-guided Reweighting, which calibrates routing using each expert’s pathway activation signals, and PAS-aware Rank Stabilization, which selectively stabilizes rank directions important to previous tasks. Experiments on a CIT benchmark show that our approach consistently outperforms a range of conventional continual learning baselines and MoE–LoRA variants in both accuracy and resistance to forgetting, without increasing model parameters. Our code is publicly available at https://github.com/yueluoshuangtian/PASs-MoE.