From Experts to Bases: Orthogonal Subspace Mixture for Continual Multimodal Instruction Tuning

Pei Chen, Xilai Wang, Shiqixu, Zejian Li, Lingyun Sun


Abstract
Multimodal Continual Instruction Tuning (MCIT) is essential for adapting Multimodal Large Language Models (MLLMs) to dynamic data streams, yet preventing catastrophic forgetting remains a major challenge. Existing parameter-efficient approaches often face a dilemma: fixed architectures suffer from knowledge interference, while dynamic strategies incur inefficient capacity expansion, limiting scalability. We propose MoBLoRA (Mixture-of-Bases LoRA), a novel framework for MCIT. Motivated by our geometric analysis revealing subspace redundancy across sequential tasks, MoBLoRA shifts the paradigm from expert selection to subspace mixing: it decomposes adaptation weights into a globally shared pool of orthonormal bases to capture task-invariant knowledge, and lightweight mixing matrices to encode task-specific variations. This design effectively decouples knowledge accumulation from task reconstruction. Experiments on standard benchmarks show MoBLoRA significantly outperforms state-of-the-art methods while maintaining superior parameter efficiency.
Anthology ID:
2026.acl-long.481
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10545–10561
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.481/
DOI:
Bibkey:
Cite (ACL):
Pei Chen, Xilai Wang, Shiqixu, Zejian Li, and Lingyun Sun. 2026. From Experts to Bases: Orthogonal Subspace Mixture for Continual Multimodal Instruction Tuning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10545–10561, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
From Experts to Bases: Orthogonal Subspace Mixture for Continual Multimodal Instruction Tuning (Chen et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.481.pdf
Checklist:
 2026.acl-long.481.checklist.pdf