Grounded Concreteness: Human-Like Concreteness Sensitivity in Vision–Language Models

Aryan Roy, Zekun Wang, Christopher J. MacLellan


Abstract
Do vision-language models (VLMs) develop more human-like sensitivity to linguistic concreteness than text-only large language models (LLMs) when both are evaluated with text-only prompts? We study this question with a controlled comparison between matched Llama text backbones and their Llama Vision counterparts across multiple model scales, treating multimodal pretraining as an ablation on perceptual grounding rather than access to images at inference. We measure concreteness effects at three complementary levels: (i) output behavior, by relating question-level concreteness to QA accuracy; (ii) embedding geometry, by testing whether representations organize along a concreteness axis; and (iii) attention dynamics, by quantifying context reliance via attention-entropy measures. In addition, we elicit token-level concreteness ratings from models and evaluate alignment to human norm distributions, testing whether multimodal training yields more human-consistent judgments. Across benchmarks and scales, VLMs show larger gains on more concrete inputs, exhibit clearer concreteness-structured representations, produce ratings that better match human norms, and display systematically different attention patterns consistent with increased grounding.
Anthology ID:
2026.findings-acl.2081
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41934–41950
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2081/
DOI:
Bibkey:
Cite (ACL):
Aryan Roy, Zekun Wang, and Christopher J. MacLellan. 2026. Grounded Concreteness: Human-Like Concreteness Sensitivity in Vision–Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41934–41950, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Grounded Concreteness: Human-Like Concreteness Sensitivity in Vision–Language Models (Roy et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2081.pdf
Checklist:
 2026.findings-acl.2081.checklist.pdf