SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking

Sifan Li; Yujun Cai; Yiwei Wang

SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking

Abstract

Vision-language models (VLMs) excel in semantic tasks but falter at a core human capability: detecting hidden content in optical illusions or AI-generated images through perceptual adjustments like zooming. We introduce HC-Bench, a benchmark of 112 images with hidden texts, objects, and illusions, revealing that leading VLMs achieve near-zero accuracy (0–5.36%) even with explicit prompting. Humans resolve such ambiguities instinctively, yet VLMs fail due to an overreliance on high-level semantics. Strikingly, we propose SemVink (Semantic Visual Thinking) by simply scaling images to low resolutions, which unlocks over 99% accuracy by eliminating redundant visual noise. This exposes a critical architectural flaw: VLMs prioritize abstract reasoning over low-level visual operations crucial for real-world robustness. Our work urges a shift toward hybrid models integrating multi-scale processing, bridging the gap between computational vision and human cognition for applications in medical imaging, security, and beyond.

Anthology ID:: 2025.emnlp-main.1381
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27155–27165
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1381/
DOI:
Bibkey:
Cite (ACL):: Sifan Li, Yujun Cai, and Yiwei Wang. 2025. SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 27155–27165, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking (Li et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1381.pdf
Checklist:: 2025.emnlp-main.1381.checklist.pdf

PDF Cite Search Checklist Fix data