The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping

Onur Kele\c{s}, Asli Ozyurek, Gerardo Ortega, Kadir G\"okg\"oz, Esam Ghaleb


Abstract
Iconicity, the resemblance between linguistic form and meaning, is pervasive in sign languages, offering a natural testbed for visual grounding in vision–language models (VLMs). We introduce the Visual Iconicity Challenge, a video-based benchmark that adapts psycholinguistic measures to evaluate VLMs on three tasks: (i) phonological sign-form prediction, (ii) transparency (inferring meaning from visual form), and (iii) graded iconicity ratings. We assess 17 state-of-the-art VLMs in zero- and few-shot settings on Sign Language of the Netherlands and compare them to human baselines. VLMs mirror human phonological difficulty patterns (e.g., handshape harder than location) and achieve moderate to strong alignment with human iconicity ratings. However, they still fail to infer lexical meaning from visual form alone and show a systematic object-based bias that inverts the human preference for action-based signs. Crucially, models with stronger phonological form prediction correlate better with human iconicity judgments, indicating shared sensitivity to visually grounded structure. Our findings validate these diagnostic tasks, show that explicit reasoning narrows the open-to-closed-model calibration gap, and motivate human-centric signals for modelling iconicity in multimodal models.
Anthology ID:
2026.acl-long.1907
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41101–41116
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1907/
DOI:
Bibkey:
Cite (ACL):
Onur Kele\c{s}, Asli Ozyurek, Gerardo Ortega, Kadir G\"okg\"oz, and Esam Ghaleb. 2026. The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41101–41116, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form–Meaning Mapping (Kele\c{s} et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1907.pdf
Checklist:
 2026.acl-long.1907.checklist.pdf