Iñigo Casanueva

Also published as: Inigo Casanueva


EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification
Georgios Spithourakis | Ivan Vulić | Michał Lis | Inigo Casanueva | Paweł Budzianowski
Findings of the Association for Computational Linguistics: NAACL 2022

Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services. Such systems should be able to enrol (E), verify (V), and identify (I) new and recurring users based on their personal information, e.g. postcode, name, and date of birth. In this work, we formalise the three authentication tasks and their evaluation protocols, and we present EVI, a challenging spoken multilingual dataset with 5,506 dialogues in English, Polish, and French. Our proposed models set the first competitive benchmarks, explore the challenges of multilingual natural language processing of spoken dialogue, and set directions for future research.

NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue
Inigo Casanueva | Ivan Vulić | Georgios Spithourakis | Paweł Budzianowski
Findings of the Association for Computational Linguistics: NAACL 2022

We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. 1) NLU++ provides fine-grained domain ontologies with a large set of challenging multi-intent sentences combined with finer-grained and thus more challenging slot sets. 2) The ontology is divided into domain-specific and generic (i.e., domain-universal) intents that overlap across domains, promoting cross-domain reusability of annotated examples. 3) The dataset design has been inspired by the problems observed in industrial ToD systems, and 4) it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, and call for further research on ToD NLU.


ConvFiT: Conversational Fine-Tuning of Pretrained Language Models
Ivan Vulić | Pei-Hao Su | Samuel Coope | Daniela Gerz | Paweł Budzianowski | Iñigo Casanueva | Nikola Mrkšić | Tsung-Hsien Wen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the fine-grained semantics of a particular task. Consequently, such specialised sentence encoders allow for treating ID as a simple semantic similarity task based on interpretable nearest neighbours retrieval. We validate the robustness and versatility of the ConvFiT framework with such similarity-based inference on the standard ID evaluation sets: ConvFiT-ed LMs achieve state-of-the-art ID performance across the board, with particular gains in the most challenging, few-shot setups.


Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI
Tsung-Hsien Wen | Asli Celikyilmaz | Zhou Yu | Alexandros Papangelis | Mihail Eric | Anuj Kumar | Iñigo Casanueva | Rushin Shah
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

Efficient Intent Detection with Dual Sentence Encoders
Iñigo Casanueva | Tadas Temčinas | Daniela Gerz | Matthew Henderson | Ivan Vulić
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

Building conversational systems in new domains and with added functionality requires resource-efficient models that work under low-data regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and ConveRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.

ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson | Iñigo Casanueva | Nikola Mrkšić | Pei-Hao Su | Tsung-Hsien Wen | Ivan Vulić
Findings of the Association for Computational Linguistics: EMNLP 2020

General-purpose pretrained sentence encoders such as BERT are not ideal for real-world conversational AI applications; they are computationally heavy, slow, and expensive to train. We propose ConveRT (Conversational Representations from Transformers), a pretraining framework for conversational tasks satisfying all the following requirements: it is effective, affordable, and quick to train. We pretrain using a retrieval-based response selection task, effectively leveraging quantization and subword-level parameterization in the dual encoder to build a lightweight memory- and energy-efficient model. We show that ConveRT achieves state-of-the-art performance across widely established response selection tasks. We also demonstrate that the use of extended dialog history as context yields further performance gains. Finally, we show that pretrained representations from the proposed encoder can be transferred to the intent classification task, yielding strong results across three diverse data sets. ConveRT trains substantially faster than standard sentence encoders or previous state-of-the-art dual encoders. With its reduced size and superior performance, we believe this model promises wider portability and scalability for Conversational AI applications.


Data Collection and End-to-End Learning for Conversational AI
Tsung-Hsien Wen | Pei-Hao Su | Paweł Budzianowski | Iñigo Casanueva | Ivan Vulić
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): Tutorial Abstracts

A fundamental long-term goal of conversational AI is to merge two main dialogue system paradigms into a standalone multi-purpose system. Such a system should be capable of conversing about arbitrary topics (Paradigm 1: open-domain dialogue systems), and simultaneously assist humans with completing a wide range of tasks with well-defined semantics such as restaurant search and booking, customer service applications, or ticket bookings (Paradigm 2: task-based dialogue systems).The recent developmental leaps in conversational AI technology are undoubtedly linked to more and more sophisticated deep learning algorithms that capture patterns in increasing amounts of data generated by various data collection mechanisms. The goal of this tutorial is therefore twofold. First, it aims at familiarising the research community with the recent advances in algorithmic design of statistical dialogue systems for both open-domain and task-based dialogue paradigms. The focus of the tutorial is on recently introduced end-to-end learning for dialogue systems and their relation to more common modular systems. In theory, learning end-to-end from data offers seamless and unprecedented portability of dialogue systems to a wide spectrum of tasks and languages. From a practical point of view, there are still plenty of research challenges and opportunities remaining: in this tutorial we analyse this gap between theory and practice, and introduce the research community with the main advantages as well as with key practical limitations of current end-to-end dialogue learning.The critical requirement of each statistical dialogue system is the data at hand. The system cannot provide assistance for the task without having appropriate task-related data to learn from. Therefore, the second major goal of this tutorial is to provide a comprehensive overview of the current approaches to data collection for dialogue, and analyse the current gaps and challenges with diverse data collection protocols, as well as their relation to and current limitations of data-driven end-to-end dialogue modeling. We will again analyse this relation and limitations both from research and industry perspective, and provide key insights on the application of state-of-the-art methodology into industry-scale conversational AI systems.

PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
Matthew Henderson | Ivan Vulić | Iñigo Casanueva | Paweł Budzianowski | Daniela Gerz | Sam Coope | Georgios Spithourakis | Tsung-Hsien Wen | Nikola Mrkšić | Pei-Hao Su
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages.

Training Neural Response Selection for Task-Oriented Dialogue Systems
Matthew Henderson | Ivan Vulić | Daniela Gerz | Iñigo Casanueva | Paweł Budzianowski | Sam Coope | Georgios Spithourakis | Tsung-Hsien Wen | Nikola Mrkšić | Pei-Hao Su
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Despite their popularity in the chatbot literature, retrieval-based models have had modest impact on task-oriented dialogue systems, with the main obstacle to their application being the low-data regime of most task-oriented dialogue tasks. Inspired by the recent success of pretraining in language modelling, we propose an effective method for deploying response selection in task-oriented dialogue. To train response selection models for task-oriented dialogue tasks, we propose a novel method which: 1) pretrains the response selection model on large general-domain conversational corpora; and then 2) fine-tunes the pretrained model for the target dialogue domain, relying only on the small in-domain dataset to capture the nuances of the given dialogue domain. Our evaluation on five diverse application domains, ranging from e-commerce to banking, demonstrates the effectiveness of the proposed training method.

A Repository of Conversational Datasets
Matthew Henderson | Paweł Budzianowski | Iñigo Casanueva | Sam Coope | Daniela Gerz | Girish Kumar | Nikola Mrkšić | Georgios Spithourakis | Pei-Hao Su | Ivan Vulić | Tsung-Hsien Wen
Proceedings of the First Workshop on NLP for Conversational AI

Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using 1-of-100 accuracy. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.


Feudal Reinforcement Learning for Dialogue Management in Large Domains
Iñigo Casanueva | Paweł Budzianowski | Pei-Hao Su | Stefan Ultes | Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Milica Gašić
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Reinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps; a first step where a master policy selects a subset of primitive actions, and a second step where a primitive action is chosen from the selected subset. The structural information included in the domain ontology is used to abstract the dialogue state space, taking the decisions at each step using different parts of the abstracted state. This, combined with an information sharing mechanism between slots, increases the scalability to large domains. We show that an implementation of this approach, based on Deep-Q Networks, significantly outperforms previous state of the art in several dialogue domains and environments, without the need of any additional reward signal.

Deep Learning for Conversational AI
Pei-Hao Su | Nikola Mrkšić | Iñigo Casanueva | Ivan Vulić
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Spoken Dialogue Systems (SDS) have great commercial potential as they promise to revolutionise the way in which humans interact with machines. The advent of deep learning led to substantial developments in this area of NLP research, and the goal of this tutorial is to familiarise the research community with the recent advances in what some call the most difficult problem in NLP. From a research perspective, the design of spoken dialogue systems provides a number of significant challenges, as these systems depend on: a) solving several difficult NLP and decision-making tasks; and b) combining these into a functional dialogue system pipeline. A key long-term goal of dialogue system research is to enable open-domain systems that can converse about arbitrary topics and assist humans with completing a wide range of tasks. Furthermore, such systems need to autonomously learn on-line to improve their performance and recover from errors using both signals from their environment and from implicit and explicit user feedback. While the design of such systems has traditionally been modular, domain and language-specific, advances in deep learning have alleviated many of the design problems. The main purpose of this tutorial is to encourage dialogue research in the NLP community by providing the research background, a survey of available resources, and giving key insights to application of state-of-the-art SDS methodology into industry-scale conversational AI systems. We plan to introduce researchers to the pipeline framework for modelling goal-oriented dialogue systems, which includes three key components: 1) Language Understanding; 2) Dialogue Management; and 3) Language Generation. The differences between goal-oriented dialogue systems and chat-bot style conversational agents will be explained in order to show the motivation behind the design of both, with the main focus on the pipeline SDS framework. For each key component, we will define the research problem, provide a brief literature review and introduce the current state-of-the-art approaches. Complementary resources (e.g. available datasets and toolkits) will also be discussed. Finally, future work, outstanding challenges, and current industry practices will be presented. All of the presented material will be made available online for future reference.

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Paweł Budzianowski | Tsung-Hsien Wen | Bo-Hsiang Tseng | Iñigo Casanueva | Stefan Ultes | Osman Ramadan | Milica Gašić
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available.To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics.At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.The contribution of this work apart from the open-sourced dataset is two-fold:firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators;secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems
Florian Kreyssig | Iñigo Casanueva | Paweł Budzianowski | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

User Simulators are one of the major tools that enable offline training of task-oriented dialogue systems. For this task the Agenda-Based User Simulator (ABUS) is often used. The ABUS is based on hand-crafted rules and its output is in semantic form. Issues arise from both properties such as limited diversity and the inability to interface a text-level belief tracker. This paper introduces the Neural User Simulator (NUS) whose behaviour is learned from a corpus and which generates natural language, hence needing a less labelled dataset than simulators generating a semantic output. In comparison to much of the past work on this topic, which evaluates user simulators on corpus-based metrics, we use the NUS to train the policy of a reinforcement learning based Spoken Dialogue System. The NUS is compared to the ABUS by evaluating the policies that were trained using the simulators. Cross-model evaluation is performed i.e. training on one simulator and testing on the other. Furthermore, the trained policies are tested on real users. In both evaluation tasks the NUS outperformed the ABUS.

Addressing Objects and Their Relations: The Conversational Entity Dialogue Model
Stefan Ultes | Paweł Budzianowski | Iñigo Casanueva | Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Yen-Chen Wu | Steve Young | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Statistical spoken dialogue systems usually rely on a single- or multi-domain dialogue model that is restricted in its capabilities of modelling complex dialogue structures, e.g., relations. In this work, we propose a novel dialogue model that is centred around entities and is able to model relations as well as multiple entities of the same type. We demonstrate in a prototype implementation benefits of relation modelling on the dialogue level and show that a trained policy using these relations outperforms the multi-domain baseline. Furthermore, we show that by modelling the relations on the dialogue level, the system is capable of processing relations present in the user input and even learns to address them in the system response.

Feudal Dialogue Management with Jointly Learned Feature Extractors
Iñigo Casanueva | Paweł Budzianowski | Stefan Ultes | Florian Kreyssig | Bo-Hsiang Tseng | Yen-chen Wu | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Reinforcement learning (RL) is a promising dialogue policy optimisation approach, but traditional RL algorithms fail to scale to large domains. Recently, Feudal Dialogue Management (FDM), has shown to increase the scalability to large domains by decomposing the dialogue management decision into two steps, making use of the domain ontology to abstract the dialogue state in each step. In order to abstract the state space, however, previous work on FDM relies on handcrafted feature functions. In this work, we show that these feature functions can be learned jointly with the policy model while obtaining similar performance, even outperforming the handcrafted features in several environments and domains.

Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems
Bo-Hsiang Tseng | Florian Kreyssig | Paweł Budzianowski | Iñigo Casanueva | Yen-Chen Wu | Stefan Ultes | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given a semantic representation provided by the dialogue manager, the language generator should generate sentences that convey desired information. Traditional template-based generators can produce sentences with all necessary information, but these sentences are not sufficiently diverse. With RNN-based models, the diversity of the generated sentences can be high, however, in the process some information is lost. In this work, we improve an RNN-based generator by considering latent information at the sentence level during generation using conditional variational auto-encoder architecture. We demonstrate that our model outperforms the original RNN-based generator, while yielding highly diverse sentences. In addition, our model performs better when the training data is limited.


PyDial: A Multi-domain Statistical Dialogue System Toolkit
Stefan Ultes | Lina M. Rojas-Barahona | Pei-Hao Su | David Vandyke | Dongho Kim | Iñigo Casanueva | Paweł Budzianowski | Nikola Mrkšić | Tsung-Hsien Wen | Milica Gašić | Steve Young
Proceedings of ACL 2017, System Demonstrations

Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning
Stefan Ultes | Paweł Budzianowski | Iñigo Casanueva | Nikola Mrkšić | Lina M. Rojas-Barahona | Pei-Hao Su | Tsung-Hsien Wen | Milica Gašić | Steve Young
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.

Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning
Paweł Budzianowski | Stefan Ultes | Pei-Hao Su | Nikola Mrkšić | Tsung-Hsien Wen | Iñigo Casanueva | Lina M. Rojas-Barahona | Milica Gašić
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Human conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue management. First, we propose a new method for hierarchical reinforcement learning using the option framework. Next, we show that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do. Moreover, we show how pretrained policies can be adapted to more complex systems with an additional set of new actions. In doing that, we show that our approach has the potential to facilitate policy optimisation for more sophisticated multi-domain dialogue systems.


Using phone features to improve dialogue state tracking generalisation to unseen states
Iñigo Casanueva | Thomas Hain | Mauro Nicolao | Phil Green
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue


Knowledge transfer between speakers for personalised dialogue management
Iñigo Casanueva | Thomas Hain | Heidi Christensen | Ricard Marxer | Phil Green
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue


homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition
Heidi Christensen | Iñigo Casanueva | Stuart Cunningham | Phil Green | Thomas Hain
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies