“What do you call a dog that is incontrovertibly true? Dogma”: Testing LLM Generalization through Humor

Alessio Cocchieri; Luca Ragazzi; Paolo Italiani; Giuseppe Tagliavini; Gianluca Moro

“What do you call a dog that is incontrovertibly true? Dogma”: Testing LLM Generalization through Humor

Alessio Cocchieri, Luca Ragazzi, Paolo Italiani, Giuseppe Tagliavini, Gianluca Moro

Abstract

Humor, requiring creativity and contextual understanding, is a hallmark of human intelligence, showcasing adaptability across linguistic scenarios. While recent advances in large language models (LLMs) demonstrate strong reasoning on various benchmarks, it remains unclear whether they truly adapt to new tasks like humans (i.e., generalize) or merely replicate memorized content. To explore this, we introduce Phunny, a new humor-based question-answering benchmark designed to assess LLMs’ reasoning through carefully crafted puns. Our dataset is manually curated to ensure novelty and minimize data contamination, providing a robust evaluation of LLMs’ linguistic comprehension. Experiments on pun comprehension, resolution, and generation reveal that most LLMs struggle with generalization, even on simple tasks, consistently underperforming the human baseline. Additionally, our detailed error analysis provides valuable insights to guide future research.

Anthology ID:: 2025.acl-long.1117
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22922–22937
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-long.1117/
DOI:
Bibkey:
Cite (ACL):: Alessio Cocchieri, Luca Ragazzi, Paolo Italiani, Giuseppe Tagliavini, and Gianluca Moro. 2025. “What do you call a dog that is incontrovertibly true? Dogma”: Testing LLM Generalization through Humor. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22922–22937, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: “What do you call a dog that is incontrovertibly true? Dogma”: Testing LLM Generalization through Humor (Cocchieri et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-long.1117.pdf

PDF Cite Search Fix data