RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios

Fei Zhao; Chengqiang Lu; Yufan Shen; Qimeng Wang; Yicheng Qian; Haoxin Zhang; Yan Gao; Yiwu; Yao Hu; Zhen Wu; Shangyu Xing; Xinyu Dai

doi:10.18653/v1/2025.findings-emnlp.1039

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios

Fei Zhao, Chengqiang Lu, Yufan Shen, Qimeng Wang, Yicheng Qian, Haoxin Zhang, Yan Gao, Yiwu, Yao Hu, Zhen Wu, Shangyu Xing, Xinyu Dai

Abstract

While various multimodal multi-image evaluation datasets have been emerged, but these datasets are primarily based on English, and there has yet to be a Chinese multi-image dataset. To fill this gap, we introduce RealBench, the first Chinese multimodal multi-image dataset, which contains 9393 samples and 69910 images. RealBench distinguishes itself by incorporating real user-generated content, ensuring high relevance to real-world applications. Additionally, the dataset covers a wide variety of scenes, image resolutions, and image structures, further increasing the difficulty of multi-image understanding. Ultimately, we conduct a comprehensive evaluation of RealBench using 21 multimodal LLMs of different sizes, including closed-source models that support multi-image inputs as well as open-source visual and video models. The experimental results indicate that even the most powerful closed-source models still face challenges when handling multi-image Chinese scenarios. Moreover, there remains a noticeable performance gap of around 71.8% on average between open-source visual/video models and closed-source models. These results show that RealBench provides an important research foundation for further exploring multi-image understanding capabilities in the Chinese context. Our datasets will be publicly available.

Anthology ID:: 2025.findings-emnlp.1039
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19097–19115
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1039/
DOI:: 10.18653/v1/2025.findings-emnlp.1039
Bibkey:
Cite (ACL):: Fei Zhao, Chengqiang Lu, Yufan Shen, Qimeng Wang, Yicheng Qian, Haoxin Zhang, Yan Gao, Yiwu, Yao Hu, Zhen Wu, Shangyu Xing, and Xinyu Dai. 2025. RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19097–19115, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Scenarios (Zhao et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1039.pdf
Checklist:: 2025.findings-emnlp.1039.checklist.pdf

PDF Cite Search Checklist Fix data