On Domain-Adaptive Post-Training for Multimodal Large Language Models

Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang


Abstract
Adapting general multimodal large language models (MLLMs) to specific domains, such as scientific and industrial fields, is highly significant in promoting their practical applications. This paper systematically investigates domain adaptation of MLLMs via post-training, focusing on data synthesis, training pipeline, and task evaluation. (1) **Data Synthesis**: Using only open-source models, we develop a generate-then-filter pipeline that curates diverse visual instruction tasks based on domain-specific image-caption pairs. The resulting data surpass the data synthesized by manual rules or strong closed-source models in enhancing domain-specific performance. (2) **Training Pipeline**: Unlike general MLLMs that typically adopt a two-stage training paradigm, we find that a single-stage approach is more effective for domain adaptation. (3) **Task Evaluation**: We conduct extensive experiments in high-impact domains such as biomedicine, food, and remote sensing, by post-training a variety of MLLMs and then evaluating MLLM performance on various domain-specific tasks. Finally, we fully open-source our models, code, and data to encourage future research in this area.
Anthology ID:
2025.findings-emnlp.17
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
274–296
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.17/
DOI:
10.18653/v1/2025.findings-emnlp.17
Bibkey:
Cite (ACL):
Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Xin Zhao, Zhongzhi Luan, Bo Dai, and Zhenliang Zhang. 2025. On Domain-Adaptive Post-Training for Multimodal Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 274–296, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
On Domain-Adaptive Post-Training for Multimodal Large Language Models (Cheng et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.17.pdf
Checklist:
 2025.findings-emnlp.17.checklist.pdf