NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, Vivek Gupta
Abstract
Cognitive textual and visual reasoning tasks, including puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. Due to extensive training on vast amounts of human-curated data, large language models (LLMs) and vision language models (VLMs) excel in common-sense reasoning tasks, but still struggle with more complex reasoning that demands deeper cognitive understanding. We introduce NTSEBENCH, a new dataset designed to evaluate cognitive multimodal reasoning and problem-solving skills of large models. The dataset contains 2,728 multiple-choice questions, accompanied by a total of 4,642 images, spanning 26 categories. These questions are drawn from the nationwide NTSE examination in India and feature a mix of visual and textual general aptitude challenges, designed to assess intelligence and critical thinking skills beyond mere rote learning. We establish baselines on the dataset using state-of-the-art LLMs and VLMs. To facilitate a comparison between open-source and propriety models, we propose four distinct modeling strategies to handle different modalities—text and images—in the dataset instances.- Anthology ID:
- 2025.findings-naacl.204
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2025
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3680–3708
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.findings-naacl.204/
- DOI:
- Cite (ACL):
- Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, and Vivek Gupta. 2025. NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3680–3708, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models (Pandya et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.findings-naacl.204.pdf