NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pranshu Pandya; Vatsal Gupta; Agney S Talwarr; Tushar Kataria; Dan Roth; Vivek Gupta

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, Vivek Gupta

Abstract

Cognitive textual and visual reasoning tasks, including puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. Due to extensive training on vast amounts of human-curated data, large language models (LLMs) and vision language models (VLMs) excel in common-sense reasoning tasks, but still struggle with more complex reasoning that demands deeper cognitive understanding. We introduce NTSEBENCH, a new dataset designed to evaluate cognitive multimodal reasoning and problem-solving skills of large models. The dataset contains 2,728 multiple-choice questions, accompanied by a total of 4,642 images, spanning 26 categories. These questions are drawn from the nationwide NTSE examination in India and feature a mix of visual and textual general aptitude challenges, designed to assess intelligence and critical thinking skills beyond mere rote learning. We establish baselines on the dataset using state-of-the-art LLMs and VLMs. To facilitate a comparison between open-source and propriety models, we propose four distinct modeling strategies to handle different modalities—text and images—in the dataset instances.

Anthology ID:: 2025.findings-naacl.204
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3680–3708
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-naacl.204/
DOI:
Bibkey:
Cite (ACL):: Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan Roth, and Vivek Gupta. 2025. NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 3680–3708, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models (Pandya et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-naacl.204.pdf

PDF Cite Search Fix data