The Roast of GPT4o: Experiments in Generating, Detecting and Evaluating Celebrity Roast Comedy

Jens Lemmens, Jérémy Genette, Tony Veale, Walter Daelemans


Abstract
We present exploratory experiments in the comedic roasting capabilities of GPT4o. Specifically, @ComedyCentral roasts were scraped to design a survey in which participants blindly evaluated snippets of human and AI roasts, and had to predict the author (AI/human) in a second round of reviewing. The results show that there is no significant difference in how the barbs in human- and AI-generated roasts are rated. Further, a qualitative analysis showed that although the model utilizes specific recurrent phrases to imitate the style of human comedians, both generative LLM detectors and humans performed suboptimally in predicting the true author of the roasts.
Anthology ID:
2026.chum-1.5
Volume:
Proceedings of the 2nd Workshop on Computational Humor (CHum 2026)
Month:
July
Year:
2026
Address:
Online
Editors:
Ori Amir, Christian F. Hempelmann, Julia Rayz, Tiansi Dong, Tristan Miller
Venues:
chum | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
65–71
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.chum-1.5/
DOI:
Bibkey:
Cite (ACL):
Jens Lemmens, Jérémy Genette, Tony Veale, and Walter Daelemans. 2026. The Roast of GPT4o: Experiments in Generating, Detecting and Evaluating Celebrity Roast Comedy. In Proceedings of the 2nd Workshop on Computational Humor (CHum 2026), pages 65–71, Online. Association for Computational Linguistics.
Cite (Informal):
The Roast of GPT4o: Experiments in Generating, Detecting and Evaluating Celebrity Roast Comedy (Lemmens et al., chum 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.chum-1.5.pdf