Xiaofeng Jia
2026
MirrorQA: Benchmarking Multimodal LLMs on Mirror-Orientation Reasoning
Jingping Liu | Xingchen Peng | Yan Zhou | Ziyan Liu | Jie Zhai | Ronghao Chen | Huacan Wang | Xiaofeng Jia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingping Liu | Xingchen Peng | Yan Zhou | Ziyan Liu | Jie Zhai | Ronghao Chen | Huacan Wang | Xiaofeng Jia
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal large language models (MLLMs) have achieved remarkable progress in recent years, yet their ability to perform left–right reasoning in mirror contexts—a fundamental element of spatial cognition—remains underexplored. To address this gap, we introduce MirrorQA, a manually constructed benchmark with 5,549 samples, designed to evaluate MLLMs’ capability to distinguish left from right from a subject-centered perspective. MirrorQA is built through a three-stage pipeline (annotation, verification, and final review) to ensure high-quality labeling. Comprehensive evaluations on both open- and closed-source MLLMs show that even the best-performing models achieve only 65.40% accuracy, far below the 99.28% accuracy of humans. These results highlight substantial challenges in current MLLMs when reasoning about left and right, and point to promising directions for future research. MirrorQA and its code are publicly available at anonymous link https://github.com/stargazer-zeno/MirrorQA.