PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes

Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, Deqing Yang


Abstract
Pun memes, which combine wordplay with visual elements, represent a popular form of humor in Chinese online communications. Despite their prevalence, current Vision-Language Models (VLMs) lack systematic evaluation in understanding and applying these culturally-specific multimodal expressions. In this paper, we introduce PunMemeCN, a novel benchmark designed to assess VLMs’ capabilities in processing Chinese pun memes across three progressive tasks: pun meme detection, sentiment analysis, and chat-driven meme response. PunMemeCN consists of 1,959 Chinese memes (653 pun memes and 1,306 non-pun memes) with comprehensive annotations of punchlines, sentiments, and explanations, alongside 2,008 multi-turn chat conversations incorporating these memes. Our experiments indicate that state-of-the-art VLMs struggle with Chinese pun memes, particularly with homophone wordplay, even with Chain-of-Thought prompting. Notably, punchlines in memes can effectively conceal potentially harmful content from AI detection. These findings underscore the challenges in cross-cultural multimodal understanding and highlight the need for culture-specific approaches to humor comprehension in AI systems.
Anthology ID:
2025.emnlp-main.944
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18705–18721
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.944/
DOI:
Bibkey:
Cite (ACL):
Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, and Deqing Yang. 2025. PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18705–18721, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes (Xu et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.944.pdf
Checklist:
 2025.emnlp-main.944.checklist.pdf