Hiroto Otake
2025
BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences
Hiroto Otake
|
Peinan Zhang
|
Yusuke Sakai
|
Masato Mita
|
Hiroki Ouchi
|
Taro Watanabe
Findings of the Association for Computational Linguistics: EMNLP 2025
Web banner advertisements, which are placed on websites to guide users to a targeted landing page (LP), are still often selected manually because human preferences are important in selecting which ads to deliver. To automate this process, we propose a new benchmark, BannerBench, to evaluate the human preference-driven banner selection process using vision-language models (VLMs). This benchmark assesses the degree of alignment with human preferences in two tasks: a ranking task and a best-choice task, both using sets of five images derived from a single LP. Our experiments show that VLMs are moderately correlated with human preferences on the ranking task. In the best-choice task, most VLMs perform close to chance level across various prompting strategies. These findings suggest that although VLMs have a basic understanding of human preferences, most of them struggle to pinpoint a single suitable option from many candidates.