IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit

Liang Lin; Siyuan Chai; Jiahao Wu; Hongbing Hu; Xiaotao Gu; Hao Hu; Fan Zhang (张帆); Wei Wang; Dan Zhang

IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit

Liang Lin, Siyuan Chai, Jiahao Wu, Hongbing Hu, Xiaotao Gu, Hao Hu, Fan Zhang, Wei Wang, Dan Zhang

Abstract

The integration of sophisticated Vision-Language Models (VLMs) in vehicular systems is revolutionizing vehicle interaction and safety, performing tasks such as Visual Question Answering (VQA). However, a critical gap persists due to the lack of a comprehensive benchmark for multimodal VQA models in vehicular scenarios. To address this, we propose IntelliCockpitBench, a benchmark that encompasses diverse automotive scenarios. It includes images from front, side, and rear cameras, various road types, weather conditions, and interior views, integrating data from both moving and stationary states. Notably, all images and queries in the benchmark are verified for high levels of authenticity, ensuring the data accurately reflects real-world conditions. A sophisticated scoring methodology combining human and model-generated assessments enhances reliability and consistency. Our contributions include a diverse and authentic dataset for automotive VQA and a robust evaluation metric aligning human and machine assessments. All code and data can be found at https://github.com/Lane315/IntelliCockpitBench.

Anthology ID:: 2025.findings-acl.798
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15453–15475
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.798/
DOI:
Bibkey:
Cite (ACL):: Liang Lin, Siyuan Chai, Jiahao Wu, Hongbing Hu, Xiaotao Gu, Hao Hu, Fan Zhang, Wei Wang, and Dan Zhang. 2025. IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit. In Findings of the Association for Computational Linguistics: ACL 2025, pages 15453–15475, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit (Lin et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.798.pdf

PDF Cite Search Fix data