Daiki Yoshida
2025
Visual Priming Effect on Large-scale Vision Language Models
Daiki Yoshida
|
Haruki Sakajo
|
Kazuki Hayashi
|
Yusuke Sakai
|
Hidetaka Kamigaito
|
Katsuhiko Hayashi
|
Taro Watanabe
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Large-scale Vision-Language Models (LVLMs) integrate linguistic and visual information, demonstrating advanced task-solving capabilities. These models are originally derived from Large Language Models, leading to strong capabilities for language tasks. However, the impact of additional visual information on model responses remains insufficiently understood. In this study, we focus on the priming effect, a psychological phenomenon, to investigate how visual information influences language task processing. We present additional intentionally designed images alongside two types of language tasks with different characteristics and analyze changes in the model’s responses. Our experimental results show that model responses shift in the direction intended by the image, suggesting that LVLMs do not simply ignore visual information but actively incorporate it into language processing. Furthermore, the similarity between this behavior and priming effects observed in human cognition suggests that LVLMs may share certain aspects of human cognitive mechanisms.
Search
Fix author
Co-authors
- Kazuki Hayashi 1
- Katsuhiko Hayashi 1
- Hidetaka Kamigaito 1
- Yusuke Sakai 1
- Haruki Sakajo 1
- show all...