RAGthoven at SemEval-2026 Task 1: A Multi-Stage Pipeline Walks Into a Benchmark and Barely Clears the Bar

Marek Suppa; Viktória Ondrejová; Lucia Ganajová; Gregor Karetka; Daniel Skala

RAGthoven at SemEval-2026 Task 1: A Multi-Stage Pipeline Walks Into a Benchmark and Barely Clears the Bar

Marek Suppa, Viktória Ondrejová, Lucia Ganajová, Gregor Karetka, Daniel Skala

Abstract

We present \textsc{RAGthoven}, our system for SemEval-2026 Task~1 (MuWaHaHa), Subtask~A (multilingual constrained humor generation in English, Spanish, and Chinese).\textsc{RAGthoven} decomposes creative text generation into a multi-stage large language model (LLM) pipeline (\textit{Planner}, \textit{Writer}, \textit{Reflector}, \textit{Judge}) grounded in computational humor theories (Benign Violation Theory, Script-based Semantic Theory of Humor) and iteratively refined through prompt engineering across ten experiments.In our final configuration, we augment the Planner with retrieval-augmented generation (RAG) from a curated joke corpus, seeding generation with diverse joke mechanisms.We additionally explore an agentic variant that exposes the same four pipeline stages as tool-calling agents orchestrated by a model loop with a \textsc{ConstraintAudit} checker. While it achieves full constraint compliance, human pairwise evaluation did not reveal a significant quality advantage over the simpler non-agentic baseline.\textsc{RAGthoven} achieves Rank~1 in all three languages, with the strongest result in Spanish (Elo 1182, 42 points above the Gemini~2.5~Flash baseline).However, while the system leads in raw Elo in Spanish, it shares Rank~1 with the baseline in all three languages due to overlapping confidence intervals; in English and Chinese the gap narrows further, suggesting that elaborate multi-stage prompt engineering may offer diminishing returns once a strong frontier model is in the loop.

Anthology ID:: 2026.semeval-1.416
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3343–3356
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.416/
DOI:
Bibkey:
Cite (ACL):: Marek Suppa, Viktória Ondrejová, Lucia Ganajová, Gregor Karetka, and Daniel Skala. 2026. RAGthoven at SemEval-2026 Task 1: A Multi-Stage Pipeline Walks Into a Benchmark and Barely Clears the Bar. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3343–3356, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: RAGthoven at SemEval-2026 Task 1: A Multi-Stage Pipeline Walks Into a Benchmark and Barely Clears the Bar (Suppa et al., SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.416.pdf

PDF Cite Search Fix data