André Kestler


2025

pdf bib
An Improved, Strong Baseline for Pre-Trained Large Language Models as Task-Oriented Dialogue Systems
Sebastian Steindl | André Kestler | Ulrich Schäfer | Bernd Ludwig
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) have recently been studied within the context of Task-Oriented Dialogues (TOD). However, previous research is inconclusive on their effectiveness, with some studies claiming that LLMs are unable to perform the TOD task and others making sophisticated additions to their setup and coming to opposite conclusions. In this work, we take a detailed look at previous results that state LLMs perform insufficiently as a TOD system. As a result, we propose an updated, stronger baseline for multiple out-of-the-box LLM performances as TOD systems. We introduce a Self-Checking mechanism as a simple, yet effective, component to drastically improve their performance. Our results show that newer, pre-trained LLMs can, in fact, perform as TOD systems out-of-the-box, challenging the previous understanding. We show that LLMs can even perform competitively to fine-tuned models in certain metrics. Based on this, we propose directions for future research. Our code is published on Github.