Visual Priming Effect on Large-scale Vision Language Models

Daiki Yoshida, Haruki Sakajo, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe


Abstract
Large-scale Vision-Language Models (LVLMs) integrate linguistic and visual information, demonstrating advanced task-solving capabilities. These models are originally derived from Large Language Models, leading to strong capabilities for language tasks. However, the impact of additional visual information on model responses remains insufficiently understood. In this study, we focus on the priming effect, a psychological phenomenon, to investigate how visual information influences language task processing. We present additional intentionally designed images alongside two types of language tasks with different characteristics and analyze changes in the model’s responses. Our experimental results show that model responses shift in the direction intended by the image, suggesting that LVLMs do not simply ignore visual information but actively incorporate it into language processing. Furthermore, the similarity between this behavior and priming effects observed in human cognition suggests that LVLMs may share certain aspects of human cognitive mechanisms.
Anthology ID:
2025.ranlp-1.160
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
1385–1395
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.160/
DOI:
Bibkey:
Cite (ACL):
Daiki Yoshida, Haruki Sakajo, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, and Taro Watanabe. 2025. Visual Priming Effect on Large-scale Vision Language Models. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 1385–1395, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Visual Priming Effect on Large-scale Vision Language Models (Yoshida et al., RANLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.160.pdf