UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging

Huaizhi Qu; Xinyu Zhao; Jie Peng; Kwonjoon Lee; Behzad Dariush; Tianlong Chen

UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging

Huaizhi Qu, Xinyu Zhao, Jie Peng, Kwonjoon Lee, Behzad Dariush, Tianlong Chen

Abstract

Multimodal Large Language Models (MLLMs) have gained increasing popularity as a promising framework for leveraging the strong language reasoning capabilities in the vision-language domain. Given a wide range of MLLMs, model merging potentially offers a cheap way to aggregate their diverse knowledge into a single MLLM. However, directly plug-in existing model merging approaches often leads to suboptimal performance due to (1) inclusion of harmful models that have over-confident predictions in the target task; (2) the lack of specialized designs for vision-language inputs. To tackle these pain points, we conduct pioneering investigations to dissect the merging procedures and propose an uncertainty-guided MLLM merging algorithm, i.e., UQ-Merge, which i) identifies beneficial candidates for merging, ii) determines the merging order and the number of helpful candidates, and iii) performs appropriate merging. Within our framework, we consider uncertainty quantification on both text and vision inputs to examine the MLLM prediction confidence, and then decide whether and when a MLLM needs to be included. It is worth mentioning that our vision-language uncertainty quantification does not require access to sample labels, making it more practical in various scenarios. Extensive experiments consistently demonstrate the superior MLLM merging performance of UQ-Merge in both held-in and held-out vision-language benchmarks. For example, compared to existing state-of-the-art merging methods, UQ-Merge brings substantial performance improvements of up to 44.3% on average accuracy in 12 datasets. Codes are available at https://anonymous.4open.science/r/UQ-Merge-7CD7.

Anthology ID:: 2025.findings-acl.73
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1401–1417
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.73/
DOI:
Bibkey:
Cite (ACL):: Huaizhi Qu, Xinyu Zhao, Jie Peng, Kwonjoon Lee, Behzad Dariush, and Tianlong Chen. 2025. UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging. In Findings of the Association for Computational Linguistics: ACL 2025, pages 1401–1417, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging (Qu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.73.pdf

PDF Cite Search Fix data