PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs
Zijing Wang, YongKang Liu, Mingyang Wang, Ercong Nie, Deyuan Chen, Zhengjie Zhao, Shi Feng, Daling Wang, Xiaocui Yang, Yifei Zhang, Hinrich Schuetze
Abstract
Multimodal Large Language Models (MLLMs) rely on strong linguistic reasoning inherited from their base language models. However, multimodal instruction fine-tuning paradoxically degrades this text’s reasoning capability, undermining multimodal performance. To address this issue, we propose a training-free framework to mitigate this degradation. Through layer-wise vision token masking, we reveal a common three-stage pattern in multimodal large language models: early-modal separation, mid-modal alignment, and late-modal degradation. By analyzing the behavior of MLLMs at different stages, we propose a plateau-guided model merging method that selectively injects base language model parameters into MLLMs. Experimental results based on five MLLMs on nine benchmarks demonstrate the effectiveness of our method. Attention-based analysis further reveals that merging shifts attention from diffuse, scattered patterns to focused localization on task-relevant visual regions.Our repository is on https://github.com/wzj1718/PlaM .- Anthology ID:
- 2026.findings-acl.1056
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21019–21035
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1056/
- DOI:
- Cite (ACL):
- Zijing Wang, YongKang Liu, Mingyang Wang, Ercong Nie, Deyuan Chen, Zhengjie Zhao, Shi Feng, Daling Wang, Xiaocui Yang, Yifei Zhang, and Hinrich Schuetze. 2026. PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 21019–21035, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs (Wang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1056.pdf