See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models

Jihao Gu; Yingyao Wang; Pi Bu; Chen Wang; Ziming Wang; Tengtao Song; Donglai Wei; Jiale Yuan; Yingxiu Zhao; Yancheng He; Shilong Li; Jiaheng Liu; Meng Cao; Jun Song; Yingshui Tan; Xiang Li (李翔); Wenbo Su; Xiaoyong Zhu; Bo Zheng

See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models

Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, Ziming Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Xiaoyong Zhu, Bo Zheng

Abstract

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models’ knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.

Anthology ID:: 2025.findings-acl.844
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16422–16447
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.844/
DOI:
Bibkey:
Cite (ACL):: Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, Ziming Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Xiaoyong Zhu, and Bo Zheng. 2025. See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16422–16447, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: See the World, Discover Knowledge: A Chinese Factuality Evaluation for Large Vision Language Models (Gu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.844.pdf

PDF Cite Search Fix data