From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities
Shixin Jiang, Jiafeng Liang, Jiyuan Wang, Xuan Dong, Heng Chang, Weijiang Yu, Jinhua Du, Ming Liu, Bing Qin
Abstract
To tackle complex tasks in real-world scenarios, more researchers are focusing on Omni-MLLMs, which aim to achieve omni-modal understanding and generation. Beyond the constraints of any specific non-linguistic modality, Omni-MLLMs map various non-linguistic modalities into the embedding space of LLMs and enable the interaction and understanding of arbitrary combinations of modalities within a single model. In this paper, we systematically investigate relevant research and provide a comprehensive survey of Omni-MLLMs. Specifically, we first explain the four core components of Omni-MLLMs for unified multi-modal modeling with a meticulous taxonomy that offers novel perspectives. Then, we introduce the effective integration achieved through two-stage training and discuss the corresponding datasets as well as evaluation. Furthermore, we summarize the main challenges of current Omni-MLLMs and outline future directions. We hope this paper serves as an introduction for beginners and promotes the advancement of related research. Resources have been made publicly availableat https://github.com/threegold116/Awesome-Omni-MLLMs.- Anthology ID:
- 2025.findings-acl.453
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8617–8652
- Language:
- URL:
- https://preview.aclanthology.org/display_plenaries/2025.findings-acl.453/
- DOI:
- Cite (ACL):
- Shixin Jiang, Jiafeng Liang, Jiyuan Wang, Xuan Dong, Heng Chang, Weijiang Yu, Jinhua Du, Ming Liu, and Bing Qin. 2025. From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8617–8652, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities (Jiang et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/display_plenaries/2025.findings-acl.453.pdf