MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu
Abstract
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a [real-time tracking website](https://mm-llms.github.io/) for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.- Anthology ID:
- 2024.findings-acl.738
- Volume:
- Findings of the Association for Computational Linguistics ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand and virtual meeting
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12401–12430
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.738
- DOI:
- Cite (ACL):
- Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, and Dong Yu. 2024. MM-LLMs: Recent Advances in MultiModal Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 12401–12430, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- MM-LLMs: Recent Advances in MultiModal Large Language Models (Zhang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.findings-acl.738.pdf