RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation

Tianyi Niu; Jaemin Cho; Elias Stengel-Eskin; Mohit Bansal

RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation

Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

Abstract

We investigate to what extent Multimodal Large Language Models (MLLMs) can accurately identify the orientation of input images rotated 0°, 90°, 180°, and 270°. This task demands robust visual reasoning capabilities to detect rotational cues and contextualize spatial relationships within images, regardless of their orientation. To evaluate MLLMs on these abilities, we introduce RotBench, a 350-image manually-filtered benchmark comprising lifestyle, portrait, and landscape images. Despite the relatively simple nature of this task, we show that several state-of-the-art open and proprietary MLLMs, including GPT-5, o3, and Gemini-2.5-Pro, do not reliably identify rotation in input images. Providing models with auxiliary information—including captions, depth maps, and more—or using chain-of-thought prompting offers only small and inconsistent improvements. Our results indicate that most models are able to reliably identify right-side-up (0°) images, while certain models are able to identify upside-down (180°) images. None can reliably distinguish between 90° and 270° rotated images. Simultaneously showing the image rotated in different orientations leads to moderate performance gains for reasoning models, while a modified setup using voting improves the performance of weaker models. We further show that fine-tuning does not improve models’ ability to distinguish 90° and 270° rotations, despite substantially improving the identification of 180° images. Together, these results reveal a significant gap between MLLMs’ spatial reasoning capabilities and human perception in identifying rotation.

Anthology ID:: 2026.eacl-long.259
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5546–5569
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.259/
DOI:
Bibkey:
Cite (ACL):: Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin, and Mohit Bansal. 2026. RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5546–5569, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation (Niu et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.259.pdf

PDF Cite Search Fix data