Hiroto Otake


2025

pdf bib
BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences
Hiroto Otake | Peinan Zhang | Yusuke Sakai | Masato Mita | Hiroki Ouchi | Taro Watanabe
Findings of the Association for Computational Linguistics: EMNLP 2025

Web banner advertisements, which are placed on websites to guide users to a targeted landing page (LP), are still often selected manually because human preferences are important in selecting which ads to deliver. To automate this process, we propose a new benchmark, BannerBench, to evaluate the human preference-driven banner selection process using vision-language models (VLMs). This benchmark assesses the degree of alignment with human preferences in two tasks: a ranking task and a best-choice task, both using sets of five images derived from a single LP. Our experiments show that VLMs are moderately correlated with human preferences on the ranking task. In the best-choice task, most VLMs perform close to chance level across various prompting strategies. These findings suggest that although VLMs have a basic understanding of human preferences, most of them struggle to pinpoint a single suitable option from many candidates.