Gregor Karetka
2026
RAGthoven at SemEval-2026 Task 1: A Multi-Stage Pipeline Walks Into a Benchmark and Barely Clears the Bar
Marek Suppa | Viktória Ondrejová | Lucia Ganajová | Gregor Karetka | Daniel Skala
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Marek Suppa | Viktória Ondrejová | Lucia Ganajová | Gregor Karetka | Daniel Skala
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We present \textsc{RAGthoven}, our system for SemEval-2026 Task~1 (MuWaHaHa), Subtask~A (multilingual constrained humor generation in English, Spanish, and Chinese).\textsc{RAGthoven} decomposes creative text generation into a multi-stage large language model (LLM) pipeline (\textit{Planner}, \textit{Writer}, \textit{Reflector}, \textit{Judge}) grounded in computational humor theories (Benign Violation Theory, Script-based Semantic Theory of Humor) and iteratively refined through prompt engineering across ten experiments.In our final configuration, we augment the Planner with retrieval-augmented generation (RAG) from a curated joke corpus, seeding generation with diverse joke mechanisms.We additionally explore an agentic variant that exposes the same four pipeline stages as tool-calling agents orchestrated by a model loop with a \textsc{ConstraintAudit} checker. While it achieves full constraint compliance, human pairwise evaluation did not reveal a significant quality advantage over the simpler non-agentic baseline.\textsc{RAGthoven} achieves Rank~1 in all three languages, with the strongest result in Spanish (Elo 1182, 42 points above the Gemini~2.5~Flash baseline).However, while the system leads in raw Elo in Spanish, it shares Rank~1 with the baseline in all three languages due to overlapping confidence intervals; in English and Chinese the gap narrows further, suggesting that elaborate multi-stage prompt engineering may offer diminishing returns once a strong frontier model is in the loop.
2025
RAGthoven at SemEval 2025 - Task 2: Enhancing Entity-Aware Machine Translation with Large Language Models, Retrieval Augmented Generation and Function Calling
Demetris Skottis | Gregor Karetka | Marek Suppa
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Demetris Skottis | Gregor Karetka | Marek Suppa
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents a system for SemEval 2025 Task 2 on entity-aware machine translation, integrating GPT-4o with Wikidata-based translations, retrieval augmented generation (RAG), and function calling. Implemented in RAGthoven, a lightweight yet powerful toolkit, our approach enriches source sentences with real-time external knowledge to address challenging or culturally specific named entities. Experiments on English-to-ten target languages show notable gains in translation quality, illustrating how LLM-based translation pipelines can leverage knowledge sources with minimal overhead. Its simplicity makes it a strong baseline for future research in entity-focused machine translation.
RAGthoven: A Configurable Toolkit for RAG-enabled LLM Experimentation
Gregor Karetka | Demetris Skottis | Lucia Dutková | Peter Hraška | Marek Suppa
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Gregor Karetka | Demetris Skottis | Lucia Dutková | Peter Hraška | Marek Suppa
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Large Language Models (LLMs) have significantly altered the landscape of Natural Language Processing (NLP), having topped the benchmarks of many standard tasks and problems, particularly when used in combination with Retrieval Augmented Generation (RAG). Despite their impressive performance and relative simplicity, its use as a baseline method has not been extensive. One of the reasons might be that adapting and optimizing RAG-based pipelines for specific NLP tasks generally requires custom development which is difficult to scale. In this work we introduce RAGthoven, a tool for automatic evaluation of RAG-based pipelines. It provides a simple yet powerful abstraction, which allows the user to start the evaluation process with nothing more than a single configuration file. To demonstrate its usefulness we conduct three case studies spanning text classification, question answering and code generation usecases. We release the code, as well as the documentation and tutorials, at https://github.com/ragthoven-dev/ragthoven