Jiahao Wu


2025

pdf bib
IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit
Liang Lin | Siyuan Chai | Jiahao Wu | Hongbing Hu | Xiaotao Gu | Hao Hu | Fan Zhang | Wei Wang | Dan Zhang
Findings of the Association for Computational Linguistics: ACL 2025

The integration of sophisticated Vision-Language Models (VLMs) in vehicular systems is revolutionizing vehicle interaction and safety, performing tasks such as Visual Question Answering (VQA). However, a critical gap persists due to the lack of a comprehensive benchmark for multimodal VQA models in vehicular scenarios. To address this, we propose IntelliCockpitBench, a benchmark that encompasses diverse automotive scenarios. It includes images from front, side, and rear cameras, various road types, weather conditions, and interior views, integrating data from both moving and stationary states. Notably, all images and queries in the benchmark are verified for high levels of authenticity, ensuring the data accurately reflects real-world conditions. A sophisticated scoring methodology combining human and model-generated assessments enhances reliability and consistency. Our contributions include a diverse and authentic dataset for automotive VQA and a robust evaluation metric aligning human and machine assessments. All code and data can be found at https://github.com/Lane315/IntelliCockpitBench.