Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

Yuriel Ryan; Rui Yang Tan; Kenny Tsu Wei Choo; Roy Ka-Wei Lee

doi:10.18653/v1/2025.findings-emnlp.755

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

Yuriel Ryan, Rui Yang Tan, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract

Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs’ ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models’ integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.

Anthology ID:: 2025.findings-emnlp.755
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14024–14050
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.755/
DOI:: 10.18653/v1/2025.findings-emnlp.755
Bibkey:
Cite (ACL):: Yuriel Ryan, Rui Yang Tan, Kenny Tsu Wei Choo, and Roy Ka-Wei Lee. 2025. Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14024–14050, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics (Ryan et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.755.pdf
Checklist:: 2025.findings-emnlp.755.checklist.pdf

PDF Cite Search Checklist Fix data