OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Xiangyu Zhao; Shengyuan Ding; Zicheng Zhang; Haian Huang; Maosongcao Maosongcao; Jiaqi Wang; Weiyun Wang; Xinyu Fang; Wenhai Wang; Guangtao Zhai; Hua Yang; Haodong Duan; Kai Chen

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Xiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosongcao Maosongcao, Jiaqi Wang, Weiyun Wang, Xinyu Fang, Wenhai Wang, Guangtao Zhai, Hua Yang, Haodong Duan, Kai Chen

Abstract

Recent advancements in open-source multi-modal large language models (MLLMs) have primarily focused on enhancing foundational capabilities, leaving a significant gap in human preference alignment. This paper introduces OmniAlign-V, a comprehensive dataset of 200K high-quality training samples featuring diverse images, complex questions, and varied response formats to improve MLLMs’ alignment with human preferences. We also present MM-AlignBench, a human-annotated benchmark specifically designed to evaluate MLLMs’ alignment with human values. Experimental results show that finetuning MLLMs with OmniAlign-V, using Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO), significantly enhances human preference alignment while maintaining or enhancing performance on standard VQA benchmarks, preserving their fundamental capabilities.

Anthology ID:: 2025.acl-long.906
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18490–18515
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.906/
DOI:
Bibkey:
Cite (ACL):: Xiangyu Zhao, Shengyuan Ding, Zicheng Zhang, Haian Huang, Maosongcao Maosongcao, Jiaqi Wang, Weiyun Wang, Xinyu Fang, Wenhai Wang, Guangtao Zhai, Hua Yang, Haodong Duan, and Kai Chen. 2025. OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18490–18515, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference (Zhao et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.906.pdf

PDF Cite Search Fix data