More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

Wei He


Abstract
Vision-Language Models (VLMs) excel at photorealistic generation, yet often struggle to represent abstract meaning such as idiomatic interpretations of noun compounds. To study whether high visual fidelity interferes with idiomatic compositionality under visual abstraction, we introduce DIVA, a controlled benchmark that replaces high-fidelity visual detail with schematic iconicity by generating paired, sense-anchored visualizations for literal and idiomatic readings.We further propose Semantic Alignment Gap (𝛥), an architecture-agnostic metric that quantifies divergence between literal and idiomatic visual grounding.We additionally introduce a directional signed bias b(t) to separately measure the direction and strength of literal preference.Evaluating 8 recent VLMs, we reveal a consistent Literal Superiority Bias: model scale alone does not resolve literal preference, and increased visual fidelity is associated with weaker symbolic alignment, suggesting cognitive interference from hyper-realistic imagery. Our findings suggest that improving compositional understanding requires iconographic abstraction of visual input and anchoring interpretation and generation in intended meaning.
Anthology ID:
2026.acl-long.717
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15753–15767
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.717/
DOI:
Bibkey:
Cite (ACL):
Wei He. 2026. More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15753–15767, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage (He, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.717.pdf
Checklist:
 2026.acl-long.717.checklist.pdf