An Improved, Strong Baseline for Pre-Trained Large Language Models as Task-Oriented Dialogue Systems

Sebastian Steindl, André Kestler, Ulrich Schäfer, Bernd Ludwig


Abstract
Large Language Models (LLMs) have recently been studied within the context of Task-Oriented Dialogues (TOD). However, previous research is inconclusive on their effectiveness, with some studies claiming that LLMs are unable to perform the TOD task and others making sophisticated additions to their setup and coming to opposite conclusions. In this work, we take a detailed look at previous results that state LLMs perform insufficiently as a TOD system. As a result, we propose an updated, stronger baseline for multiple out-of-the-box LLM performances as TOD systems. We introduce a Self-Checking mechanism as a simple, yet effective, component to drastically improve their performance. Our results show that newer, pre-trained LLMs can, in fact, perform as TOD systems out-of-the-box, challenging the previous understanding. We show that LLMs can even perform competitively to fine-tuned models in certain metrics. Based on this, we propose directions for future research. Our code is published on Github.
Anthology ID:
2025.findings-emnlp.610
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11388–11398
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.610/
DOI:
10.18653/v1/2025.findings-emnlp.610
Bibkey:
Cite (ACL):
Sebastian Steindl, André Kestler, Ulrich Schäfer, and Bernd Ludwig. 2025. An Improved, Strong Baseline for Pre-Trained Large Language Models as Task-Oriented Dialogue Systems. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11388–11398, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
An Improved, Strong Baseline for Pre-Trained Large Language Models as Task-Oriented Dialogue Systems (Steindl et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.610.pdf
Checklist:
 2025.findings-emnlp.610.checklist.pdf