UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging
Huaizhi Qu, Xinyu Zhao, Jie Peng, Kwonjoon Lee, Behzad Dariush, Tianlong Chen
Abstract
Multimodal Large Language Models (MLLMs) have gained increasing popularity as a promising framework for leveraging the strong language reasoning capabilities in the vision-language domain. Given a wide range of MLLMs, model merging potentially offers a cheap way to aggregate their diverse knowledge into a single MLLM. However, directly plug-in existing model merging approaches often leads to suboptimal performance due to (1) inclusion of harmful models that have over-confident predictions in the target task; (2) the lack of specialized designs for vision-language inputs. To tackle these pain points, we conduct pioneering investigations to dissect the merging procedures and propose an uncertainty-guided MLLM merging algorithm, i.e., UQ-Merge, which i) identifies beneficial candidates for merging, ii) determines the merging order and the number of helpful candidates, and iii) performs appropriate merging. Within our framework, we consider uncertainty quantification on both text and vision inputs to examine the MLLM prediction confidence, and then decide whether and when a MLLM needs to be included. It is worth mentioning that our vision-language uncertainty quantification does not require access to sample labels, making it more practical in various scenarios. Extensive experiments consistently demonstrate the superior MLLM merging performance of UQ-Merge in both held-in and held-out vision-language benchmarks. For example, compared to existing state-of-the-art merging methods, UQ-Merge brings substantial performance improvements of up to 44.3% on average accuracy in 12 datasets. Codes are available at https://anonymous.4open.science/r/UQ-Merge-7CD7.- Anthology ID:
- 2025.findings-acl.73
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1401–1417
- Language:
- URL:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.findings-acl.73/
- DOI:
- 10.18653/v1/2025.findings-acl.73
- Cite (ACL):
- Huaizhi Qu, Xinyu Zhao, Jie Peng, Kwonjoon Lee, Behzad Dariush, and Tianlong Chen. 2025. UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1401–1417, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging (Qu et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2025.findings-acl.73.pdf