RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios
Fei Zhao, Chengqiang Lu, Yufan Shen, Qimeng Wang, Yicheng Qian, Haoxin Zhang, Yan Gao, Yiwu, Yao Hu, Zhen Wu, Shangyu Xing, Xinyu Dai
Abstract
While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context. Our datasets will be publicly available.- Anthology ID:
- 2025.findings-emnlp.1039
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2025
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19097–19115
- Language:
- URL:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1039/
- DOI:
- 10.18653/v1/2025.findings-emnlp.1039
- Cite (ACL):
- Fei Zhao, Chengqiang Lu, Yufan Shen, Qimeng Wang, Yicheng Qian, Haoxin Zhang, Yan Gao, Yiwu, Yao Hu, Zhen Wu, Shangyu Xing, and Xinyu Dai. 2025. RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19097–19115, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios (Zhao et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1039.pdf