Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

Donggeon Lee; Joonwon Jang; Jihae Jeong; Hwanjo Yu

Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu

Abstract

Rapid deployment of vision-language models (VLMs) magnifies safety risks, yet most evaluations rely on artificial images. This study asks: How safe are current VLMs when confronted with meme images that ordinary users share? To investigate this question, we introduce MemeSafetyBench, a 50,430-instance benchmark pairing real meme images with both harmful and benign instructions. Using a comprehensive safety taxonomy and LLM-based instruction generation, we assess multiple VLMs across single and multi-turn interactions. We investigate how real-world memes influence harmful outputs, the mitigating effects of conversational context, and the relationship between model scale and safety metrics. Our findings demonstrate that VLMs are more vulnerable to meme-based harmful prompts than to synthetic or typographic images. Memes significantly increase harmful responses and decrease refusals compared to text-only inputs. Though multi-turn interactions provide partial mitigation, elevated vulnerability persists. These results highlight the need for ecologically valid evaluations and stronger safety mechanisms. MemeSafetyBench is publicly available at https://github.com/oneonlee/Meme-Safety-Bench.

Anthology ID:: 2025.emnlp-main.1555
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30533–30576
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1555/
DOI:
Bibkey:
Cite (ACL):: DongGeon Lee, Joonwon Jang, Jihae Jeong, and Hwanjo Yu. 2025. Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30533–30576, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study (Lee et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1555.pdf
Checklist:: 2025.emnlp-main.1555.checklist.pdf

PDF Cite Search Checklist Fix data