MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models

Mingrui Xie, Lulu Xu, Junliang Du


Abstract
Vision-language models like CLIP demonstrate exceptional generalization capabilities but face significant adaptation challenges due to parameter scale, prompt sensitivity, and cross-modal alignment difficulties. Existing approaches primarily focus on single-modality adjustments, leading to suboptimal alignment and limited generalization. We introduce MAFMO, a plug-and-play framework comprising: (1) a Harmonic Cross-Modal Adapter enabling efficient cross-modal knowledge transfer; (2) a Meta-Template Optimization module dynamically generating input-dependent templates; and (3) a Cross-Modal Knowledge Synthesis mechanism preserving critical structural relationships during adaptation. Extensive experiments across multiple fine-grained visual recognition benchmarks demonstrate MAFMO consistently improves existing methods’ performance on both novel classes and harmonic mean, while maintaining robustness under various challenging conditions with minimal computational overhead.
Anthology ID:
2025.findings-emnlp.953
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17576–17585
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.953/
DOI:
10.18653/v1/2025.findings-emnlp.953
Bibkey:
Cite (ACL):
Mingrui Xie, Lulu Xu, and Junliang Du. 2025. MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17576–17585, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models (Xie et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.953.pdf
Checklist:
 2025.findings-emnlp.953.checklist.pdf