The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping
Onur Kele\c{s}, Asli Ozyurek, Gerardo Ortega, Kadir G\"okg\"oz, Esam Ghaleb
Abstract
Iconicity, the resemblance between linguistic form and meaning, is pervasive in sign languages, offering a natural testbed for visual grounding in vision–language models (VLMs). We introduce the Visual Iconicity Challenge, a video-based benchmark that adapts psycholinguistic measures to evaluate VLMs on three tasks: (i) phonological sign-form prediction, (ii) transparency (inferring meaning from visual form), and (iii) graded iconicity ratings. We assess 17 state-of-the-art VLMs in zero- and few-shot settings on Sign Language of the Netherlands and compare them to human baselines. VLMs mirror human phonological difficulty patterns (e.g., handshape harder than location) and achieve moderate to strong alignment with human iconicity ratings. However, they still fail to infer lexical meaning from visual form alone and show a systematic object-based bias that inverts the human preference for action-based signs. Crucially, models with stronger phonological form prediction correlate better with human iconicity judgments, indicating shared sensitivity to visually grounded structure. Our findings validate these diagnostic tasks, show that explicit reasoning narrows the open-to-closed-model calibration gap, and motivate human-centric signals for modelling iconicity in multimodal models.- Anthology ID:
- 2026.acl-long.1907
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 41101–41116
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1907/
- DOI:
- Cite (ACL):
- Onur Kele\c{s}, Asli Ozyurek, Gerardo Ortega, Kadir G\"okg\"oz, and Esam Ghaleb. 2026. The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41101–41116, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping (Kele\c{s} et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1907.pdf