PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes

Zhijun Xu; Siyu Yuan; Yiqiao Zhang; Jingyu Sun; Tong Zheng; Deqing Yang

PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes

Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, Deqing Yang

Abstract

Pun memes, which combine wordplay with visual elements, represent a popular form of humor in Chinese online communications. Despite their prevalence, current Vision-Language Models (VLMs) lack systematic evaluation in understanding and applying these culturally-specific multimodal expressions. In this paper, we introduce PunMemeCN, a novel benchmark designed to assess VLMs’ capabilities in processing Chinese pun memes across three progressive tasks: pun meme detection, sentiment analysis, and chat-driven meme response. PunMemeCN consists of 1,959 Chinese memes (653 pun memes and 1,306 non-pun memes) with comprehensive annotations of punchlines, sentiments, and explanations, alongside 2,008 multi-turn chat conversations incorporating these memes. Our experiments indicate that state-of-the-art VLMs struggle with Chinese pun memes, particularly with homophone wordplay, even with Chain-of-Thought prompting. Notably, punchlines in memes can effectively conceal potentially harmful content from AI detection. These findings underscore the challenges in cross-cultural multimodal understanding and highlight the need for culture-specific approaches to humor comprehension in AI systems.

Anthology ID:: 2025.emnlp-main.944
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18705–18721
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.944/
DOI:
Bibkey:
Cite (ACL):: Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, and Deqing Yang. 2025. PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18705–18721, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes (Xu et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.944.pdf
Checklist:: 2025.emnlp-main.944.checklist.pdf

PDF Cite Search Checklist Fix data