High-quality datasets are essential for building effective task-oriented dialogue (TOD) systems. The existing TOD datasets often present overly simplified interactions, where users incrementally express straightforward requests that can be managed with basic slot-value style dialogue states, such as “hotel-area = east.” However, this approach does not reflect real-life scenarios in which users may express complex constraints and preferences. To address this gap, in this paper, we propose SQLWOZ, a novel TOD dataset designed to capture complex, real-world user requirements. The user requirements in SQLWOZ include the four categories: 1) multiple values for a slot, 2) excluded values within a slot, 3) preferred or prioritized values, and 4) conditional values based on other conditions. We utilize SQL statements as a formalized and expressive representation of dialogue states within SQLWOZ. To evaluate the dataset, we adapt large language models as dialogue agents and conduct extensive experiments on the SQL-based dialogue state tracking, dialogue response generation and end-to-end TOD tasks. The experimental results demonstrate the complexity and quality of SQLWOZ, establishing it as a new benchmark for advancing TOD research.
Task-oriented dialogue (TOD) systems are predominantly designed to be composed of several functional modules (e.g. dialogue state tracker, dialogue policy, natural language generation) whether they are pipeline or end-to-end architectures. However, this modular design not only heavily relies on massive fully-annotated data, but also suffers from many intrinsic drawbacks, such as serious error accumulation, poor generalization ability, high customization cost, and low fault tolerance rate. In this paper, we rethink the architecture of the task-oriented dialogue systems and propose a novel fully zero-shot autonomous TOD agent, named AutoTOD, where all the delicate modules in traditional TOD systems are deprecated and all it needs is a general-purpose instruction-following language model (e.g. GPT-4). AutoTOD only leverages a simple instruction schema consisting of the description of tasks and external APIs, and can autonomously decide what to do at each dialogue turn, including asking for information, calling APIs, summarizing API results, and correcting previous mistakes. Moreover, we propose a simulation-based evaluation framework to better validate the abilities of TOD models in real-life scenarios. Extensive experiments conducted on the MultiWOZ and SGD datasets show the superior task completion ability and flexible language skills of AutoTOD.