Seeing the Other Side: Diagnostic Tasks for Viewpoint Reasoning in Vision–Language Models

Makoto Takenaka; Hitomi Yanaka

Seeing the Other Side: Diagnostic Tasks for Viewpoint Reasoning in Vision–Language Models

Abstract

Humans can integrate multiple visual perspectives and infer how an object appears from unseen sides. This study investigates whether Large Vision Language Models (LVLMs) exhibit a comparable ability for reference-grounded spatial reasoning. We propose two diagnostic tasks: Opposite-Side Reasoning, which determines whether two images show the same object from opposite viewpoints, and Viewpoint Identification, which predicts the viewpoint of a target image using a reference image and its label. An additional condition, Viewpoint Identification (no-ref), removes reference information to reveal cases solvable without it, distinguishing genuine reasoning from bias-driven shortcuts. Our evaluation shows that both open and proprietary LVLMs fall far short of human performance. Even state-of-the-art proprietary LVLMs with relatively high accuracy retain many correct answers when reference information is removed, suggesting that their success often relies on linguistic or dataset-driven priors rather than genuine reference-based reasoning. These findings indicate that current LVLMs have not yet achieved consistent, reference-grounded spatial reasoning. Our datasets in this work will be released on the Hugging Face Hub to support future research on multimodal viewpoint reasoning and spatial understanding.

Anthology ID:: 2026.lrec-main.737
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 9386–9395
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.737/
DOI:
Bibkey:
Cite (ACL):: Makoto Takenaka and Hitomi Yanaka. 2026. Seeing the Other Side: Diagnostic Tasks for Viewpoint Reasoning in Vision–Language Models. International Conference on Language Resources and Evaluation, main:9386–9395.
Cite (Informal):: Seeing the Other Side: Diagnostic Tasks for Viewpoint Reasoning in Vision–Language Models (Takenaka & Yanaka, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.737.pdf

PDF Cite Search Fix data