Oliver Lemon


2024

pdf
Divide and Conquer: Rethinking Ambiguous Candidate Identification in Multimodal Dialogues with Pseudo-Labelling
Bhathiya Hemanthage | Christian Dondrup | Hakan Bilen | Oliver Lemon
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Ambiguous Candidate Identification(ACI) in multimodal dialogue is the task of identifying all potential objects that a user’s utterance could be referring to in a visual scene, in cases where the reference cannot be uniquely determined. End-to-end models are the dominant approach for this task, but have limited real-world applicability due to unrealistic inference-time assumptions such as requiring predefined catalogues of items. Focusing on a more generalized and realistic ACI setup, we demonstrate that a modular approach, which first emphasizes language-only reasoning over dialogue context before performing vision-language fusion, significantly outperforms end-to-end trained baselines. To mitigate the lack of annotations for training the language-only module (student), we propose a pseudo-labelling strategy with a prompted Large Language Model (LLM) as the teacher.

pdf
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers
Georgios Pantazopoulos | Alessandro Suglia | Oliver Lemon | Arash Eshghi
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

An effective method for combining frozen large language models (LLM) and visual encoders involves a resampler module that creates a ‘visual prompt’ which is provided to the LLM, along with the textual prompt. While this approach has enabled impressive performance across many coarse-grained tasks like image captioning and visual question answering, more fine-grained tasks that require spatial understanding have not been thoroughly examined. In this paper, we use diagnostic classifiers to measure the extent to which the visual prompt produced by the resampler encodes spatial information. Our results show that this information is largely absent from the resampler output when kept frozen during training of the classifiers. However, when the resampler and classifier are trained jointly, we observe a significant performance boost. This shows that the compression achieved by the resamplers can in principle encode the requisite spatial information, but that more object-aware objectives are needed at the pretraining stage to facilitate this capability.

pdf
Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic
Angus Addlesee | Neeraj Cherakara | Nivan Nelson | Daniel Hernandez Garcia | Nancie Gunson | Weronika Sieińska | Christian Dondrup | Oliver Lemon
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We have deployed an LLM-based spoken dialogue system in a real hospital. The ARI social robot embodies our system, which patients and their companions can have multi-party conversations with together. In order to enable this multi-party ability, multimodality is critical. Our system, therefore, receives speech and video as input, and generates both speech and gestures (arm, head, and eye movements). In this paper, we describe our complex setting and the architecture of our dialogue system. Each component is detailed, and a video of the full system is available with the appropriate components highlighted in real-time. Our system decides when it should take its turn, generates human-like clarification requests when the patient pauses mid-utterance, answers in-domain questions (grounding to the in-prompt knowledge), and responds appropriately to out-of-domain requests (like generating jokes or quizzes). This latter feature is particularly remarkable as real patients often utter unexpected sentences that could not be handled previously.

pdf
Clarifying Completions: Evaluating How LLMs Respond to Incomplete Questions
Angus Addlesee | Oliver Lemon | Arash Eshghi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

People understand and produce language incrementally on a word by word basis. This gives rise to many characteristic conversational phenomena including long mid-sentence pauses that are followed by incremental clarification requests (iCRs) intended to recover the rest of the truncated turn (see Fig. 1; (A), (B), (C)). The ability to generate iCRs is important in natural conversational AI systems, and crucial to their accessibility to users with memory impairment. In this paper, we collect, release and analyse SLUICE-CR: a large corpus of 3000 human produced iCRs. We then use this corpus to probe the incremental processing capability of a number of state of the art LLMs by evaluating the quality of the model’s generated iCRs in response to incomplete questions. Our evaluations show that the ability to generate contextually appropriate iCRs only emerges at larger LLM sizes, and only when prompted with example iCRs from our corpus. They also indicate that autoregressive LMs are, in principle, able to both understand and generate language incrementally.

2023

pdf
Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering
Angus Addlesee | Weronika Sieińska | Nancie Gunson | Daniel Hernandez Garcia | Christian Dondrup | Oliver Lemon
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

This paper evaluates the extent to which current LLMs can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share goals, answer each other’s goals, and provide other people’s goals in MPCs - none of which occur in dyadic interactions. To understand user goals in MPCs, we compared three methods in zero-shot and few-shot settings: we fine-tuned T5, created pre-training tasks to train DialogLM using LED, and employed prompt engineering techniques with GPT-3.5-turbo, to determine which approach can complete this novel task with limited data. GPT-3.5-turbo significantly outperformed the others in a few-shot setting. The ‘reasoning’ style prompt, when given 7% of the corpus as example annotated conversations, was the best performing method. It correctly annotated 62.32% of the goal tracking MPCs, and 69.57% of the intent-slot recognition MPCs. A ‘story’ style prompt increased model hallucination, which could be detrimental if deployed in safety-critical settings. We conclude that multi-party conversations still challenge state-of-the-art LLMs.

pdf
FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions
Neeraj Cherakara | Finny Varghese | Sheena Shabana | Nivan Nelson | Abhiram Karukayil | Rohith Kulothungan | Mohammed Afil Farhan | Birthe Nesset | Meriam Moujahid | Tanvi Dinkar | Verena Rieser | Oliver Lemon
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

We demonstrate an embodied conversational agent that can function as a receptionist and generate a mixture of open and closed-domain dialogue along with facial expressions, by using a large language model (LLM) to develop an engaging conversation. We deployed the system onto a Furhat robot, which is highly expressive and capable of using both verbal and nonverbal cues during interaction. The system was designed specifically for the National Robotarium to interact with visitors through natural conversations, providing them with information about the facilities, research, news, upcoming events, etc. The system utilises the state-of-the-art GPT-3.5 model to generate such information along with domain-general conversations and facial expressions based on prompt engineering.

pdf
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos | Malvina Nikandrou | Amit Parekh | Bhathiya Hemanthage | Arash Eshghi | Ioannis Konstas | Verena Rieser | Oliver Lemon | Alessandro Suglia
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action prediction as multimodal text generation. By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks. Different to previous modular approaches with independently trained components, we use a single multitask model where each task contributes to goal completion. EMMA performs on par with similar models on several VL benchmarks and sets a new state-of-the-art performance (36.81% success rate) on the Dialog-guided Task Completion (DTC), a benchmark to evaluate dialog-guided agents in the Alexa Arena.

pdf
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation
Bhathiya Hemanthage | Christian Dondrup | Phil Bartie | Oliver Lemon
Proceedings of the 15th International Conference on Computational Semantics

SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pretrained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) informa- tion. In addition the model does not rely on task-specific architectural changes such as classification heads.

2022

pdf bib
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Oliver Lemon | Dilek Hakkani-Tur | Junyi Jessy Li | Arash Ashrafzadeh | Daniel Hernández Garcia | Malihe Alikhani | David Vandyke | Ondřej Dušek
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
A Visually-Aware Conversational Robot Receptionist
Nancie Gunson | Daniel Hernandez Garcia | Weronika Sieińska | Angus Addlesee | Christian Dondrup | Oliver Lemon | Jose L. Part | Yanchao Yu
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Socially Assistive Robots (SARs) have the potential to play an increasingly important role in a variety of contexts including healthcare, but most existing systems have very limited interactive capabilities. We will demonstrate a robot receptionist that not only supports task-based and social dialogue via natural spoken conversation but is also capable of visually grounded dialogue; able to perceive and discuss the shared physical environment (e.g. helping users to locate personal belongings or objects of interest). Task-based dialogues include check-in, navigation and FAQs about facilities, alongside social features such as chit-chat, access to the latest news and a quiz game to play while waiting. We also show how visual context (objects and their spatial relations) can be combined with linguistic representations of dialogue context, to support visual dialogue and question answering. We will demonstrate the system on a humanoid ARI robot, which is being deployed in a hospital reception area.

pdf
Demonstrating EMMA: Embodied MultiModal Agent for Language-guided Action Execution in 3D Simulated Environments
Alessandro Suglia | Bhathiya Hemanthage | Malvina Nikandrou | Georgios Pantazopoulos | Amit Parekh | Arash Eshghi | Claudio Greco | Ioannis Konstas | Oliver Lemon | Verena Rieser
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

We demonstrate EMMA, an embodied multimodal agent which has been developed for the Alexa Prize SimBot challenge. The agent acts within a 3D simulated environment for household tasks. EMMA is a unified and multimodal generative model aimed at solving embodied tasks. In contrast to previous work, our approach treats multiple multimodal tasks as a single multimodal conditional text generation problem, where a model learns to output text given both language and visual input. Furthermore, we showcase that a single generative agent can solve tasks with visual inputs of varying length, such as answering questions about static images, or executing actions given a sequence of previous frames and dialogue utterances. The demo system will allow users to interact conversationally with EMMA in embodied dialogues in different 3D environments from the TEACh dataset.

2021

pdf
The Spoon Is in the Sink: Assisting Visually Impaired People in the Kitchen
Katie Baker | Amit Parekh | Adrien Fabre | Angus Addlesee | Ruben Kruiper | Oliver Lemon
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

Visual Question Answering (VQA) systems are increasingly adept at a variety of tasks, and this technology can be used to assist blind and partially sighted people. To do this, the system’s responses must not only be accurate, but usable. It is also vital for assistive technologies to be designed with a focus on: (1) privacy, as the camera may capture a user’s mail, medication bottles, or other sensitive information; (2) transparency, so that the system’s behaviour can be explained and trusted by users; and (3) controllability, to tailor the system for a particular domain or user group. We have therefore extended a conversational VQA framework, called Aye-saac, with these objectives in mind. Specifically, we gave Aye-saac the ability to answer visual questions in the kitchen, a particularly challenging area for visually impaired people. Our system can now answer questions about quantity, positioning, and system confidence in regards to 299 kitchen objects. Questions about the spatial relations between these objects are particularly helpful to visually impaired people, and our system output more usable answers than other state of the art end-to-end VQA systems.

pdf
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
Alessandro Suglia | Yonatan Bisk | Ioannis Konstas | Antonio Vergari | Emanuele Bastianelli | Andrea Vanzo | Oliver Lemon
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Guessing games are a prototypical instance of the “learning by interacting” paradigm. This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA). We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL). We evaluate the ability of both procedures to generalise: an in-domain evaluation shows an increased accuracy (+7.79) compared with competitors on the evaluation suite CompGuessWhat?!; a transfer evaluation shows improved performance for VQA on the TDIUC dataset in terms of harmonic average accuracy (+5.31) thanks to more fine-grained object representations learned via SPIEL.

2020

pdf
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia | Antonio Vergari | Ioannis Konstas | Yonatan Bisk | Emanuele Bastianelli | Andrea Vanzo | Oliver Lemon
Proceedings of the 28th International Conference on Computational Linguistics

In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time. This provides an unnatural performance advantage when categories at inference time match those at training time, and it causes models to fail in more realistic “zero-shot” scenarios where out-of-domain object categories are involved. To overcome this issue, we introduce a novel “imagination” module based on Regularized Auto-Encoders, that learns context-aware and category-aware latent embeddings without relying on category labels at inference time. Our imagination module outperforms state-of-the-art competitors by 8.26% gameplay accuracy in the CompGuessWhat?! zero-shot scenario (Suglia et al., 2020), and it improves the Oracle and Guesser accuracy by 2.08% and 12.86% in the GuessWhat?! benchmark, when no gold categories are available at inference time. The imagination module also boosts reasoning about object properties and attributes.

pdf
CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
Alessandro Suglia | Ioannis Konstas | Andrea Vanzo | Emanuele Bastianelli | Desmond Elliott | Stella Frank | Oliver Lemon
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Approaches to Grounded Language Learning are commonly focused on a single task-based final performance measure which may not depend on desirable properties of the learned hidden representations, such as their ability to predict object attributes or generalize to unseen situations. To remedy this, we present GroLLA, an evaluation framework for Grounded Language Learning with Attributes based on three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular with respect to attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with several attributes from resources such as VISA and ImSitu. We then compare several hidden state representations from current state-of-the-art approaches to Grounded Language Learning. By using diagnostic classifiers, we show that current models’ learned representations are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06%).

pdf
Conversational Agents for Intelligent Buildings
Weronika Sieińska | Christian Dondrup | Nancie Gunson | Oliver Lemon
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

We will demonstrate a deployed conversational AI system that acts as a host of a smart-building on a university campus. The system combines open-domain social conversation with task-based conversation regarding navigation in the building, live resource updates (e.g. available computers) and events in the building. We are able to demonstrate the system on several platforms: Google Home devices, Android phones, and a Furhat robot.

2019

pdf
Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks
Igor Shalyminov | Sungjin Lee | Arash Eshghi | Oliver Lemon
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Goal-oriented dialogue systems are now being widely adopted in industry where it is of key importance to maintain a rapid prototyping cycle for new products and domains. Data-driven dialogue system development has to be adapted to meet this requirement — therefore, reducing the amount of data and annotations necessary for training such systems is a central research problem. In this paper, we present the Dialogue Knowledge Transfer Network (DiKTNet), a state-of-the-art approach to goal-oriented dialogue generation which only uses a few example dialogues (i.e. few-shot learning), none of which has to be annotated. We achieve this by performing a 2-stage training. Firstly, we perform unsupervised dialogue representation pre-training on a large source of goal-oriented dialogues in multiple domains, the MetaLWOz corpus. Secondly, at the transfer stage, we train DiKTNet using this representation together with 2 other textual knowledge sources with different levels of generality: ELMo encoder and the main dataset’s source domains. Our main dataset is the Stanford Multi-Domain dialogue corpus. We evaluate our model on it in terms of BLEU and Entity F1 scores, and show that our approach significantly and consistently improves upon a series of baseline models as well as over the previous state-of-the-art dialogue generation model, ZSDG. The improvement upon the latter — up to 10% in Entity F1 and the average of 3% in BLEU score — is achieved using only 10% equivalent of ZSDG’s in-domain training data.

pdf
Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach
Igor Shalyminov | Sungjin Lee | Arash Eshghi | Oliver Lemon
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems. In a real-world enterprise setting where dialogue systems are developed rapidly and are expected to work robustly for an ever-growing variety of domains, products, and scenarios, efficient learning from a limited number of examples becomes indispensable. In this paper, we introduce a technique to achieve state-of-the-art dialogue generation performance in a few-shot setup, without using any annotated data. We do this by leveraging background knowledge from a larger, more highly represented dialogue source — namely, the MetaLWOz dataset. We evaluate our model on the Stanford Multi-Domain Dialogue Dataset, consisting of human-human goal-oriented dialogues in in-car navigation, appointment scheduling, and weather information domains. We show that our few-shot approach achieves state-of-the art results on that dataset by consistently outperforming the previous best model in terms of BLEU and Entity F1 scores, while being more data-efficient than it by not requiring any data annotation.

pdf
Hierarchical Multi-Task Natural Language Understanding for Cross-domain Conversational AI: HERMIT NLU
Andrea Vanzo | Emanuele Bastianelli | Oliver Lemon
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

We present a new neural architecture for wide-coverage Natural Language Understanding in Spoken Dialogue Systems. We develop a hierarchical multi-task architecture, which delivers a multi-layer representation of sentence meaning (i.e., Dialogue Acts and Frame-like structures). The architecture is a hierarchy of self-attention mechanisms and BiLSTM encoders followed by CRF tagging layers. We describe a variety of experiments, showing that our approach obtains promising results on a dataset annotated with Dialogue Acts and Frame Semantics. Moreover, we demonstrate its applicability to a different, publicly available NLU dataset annotated with domain-specific intents and corresponding semantic roles, providing overall performance higher than state-of-the-art tools such as RASA, Dialogflow, LUIS, and Watson. For example, we show an average 4.45% improvement in entity tagging F-score over Rasa, Dialogflow and LUIS.

2018

pdf bib
Neural Response Ranking for Social Conversation: A Data-Efficient Approach
Igor Shalyminov | Ondřej Dušek | Oliver Lemon
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

The overall objective of ‘social’ dialogue systems is to support engaging, entertaining, and lengthy conversations on a wide variety of topics, including social chit-chat. Apart from raw dialogue data, user-provided ratings are the most common signal used to train such systems to produce engaging responses. In this paper we show that social dialogue systems can be trained effectively from raw unannotated data. Using a dataset of real conversations collected in the 2017 Alexa Prize challenge, we developed a neural ranker for selecting ‘good’ system responses to user utterances, i.e. responses which are likely to lead to long and engaging conversations. We show that (1) our neural ranker consistently outperforms several strong baselines when trained to optimise for user ratings; (2) when trained on larger amounts of data and only using conversation length as the objective, the ranker performs better than the one trained using ratings – ultimately reaching a Precision@1 of 0.87. This advance will make data collection for social conversational agents simpler and less expensive in the future.

2017

pdf
Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars
Arash Eshghi | Igor Shalyminov | Oliver Lemon
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We investigate an end-to-end method for automatically inducing task-based dialogue systems from small amounts of unannotated dialogue data. It combines an incremental semantic grammar - Dynamic Syntax and Type Theory with Records (DS-TTR) - with Reinforcement Learning (RL), where language generation and dialogue management are a joint decision problem. The systems thus produced are incremental: dialogues are processed word-by-word, shown previously to be essential in supporting natural, spontaneous dialogue. We hypothesised that the rich linguistic knowledge within the grammar should enable a combinatorially large number of dialogue variations to be processed, even when trained on very few dialogues. Our experiments show that our model can process 74% of the Facebook AI bAbI dataset even when trained on only 0.13% of the data (5 dialogues). It can in addition process 65% of bAbI+, a corpus we created by systematically adding incremental dialogue phenomena such as restarts and self-corrections to bAbI. We compare our model with a state-of-the-art retrieval model, MEMN2N. We find that, in terms of semantic accuracy, the MEMN2N model shows very poor robustness to the bAbI+ transformations even when trained on the full bAbI dataset.

pdf
Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents
Simon Keizer | Markus Guhe | Heriberto Cuayáhuitl | Ioannis Efstathiou | Klaus-Peter Engelbrecht | Mihai Dobre | Alex Lascarides | Oliver Lemon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we present a comparative evaluation of various negotiation strategies within an online version of the game “Settlers of Catan”. The comparison is based on human subjects playing games against artificial game-playing agents (‘bots’) which implement different negotiation dialogue strategies, using a chat dialogue interface to negotiate trades. Our results suggest that a negotiation strategy that uses persuasion, as well as a strategy that is trained from data using Deep Reinforcement Learning, both lead to an improved win rate against humans, compared to previous rule-based and supervised learning baseline dialogue negotiators.

pdf bib
The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings
Yanchao Yu | Arash Eshghi | Gregory Mills | Oliver Lemon
Proceedings of the Sixth Workshop on Vision and Language

We motivate and describe a new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; anon.) with a novel task, where a Learner needs to learn invented visual attribute words (such as “burchak” for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self- and other-correction, mid-sentence continuations, interruptions, turn overlaps, fillers, hedges and many kinds of ellipsis. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental dialogue data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.

pdf bib
Learning how to Learn: An Adaptive Dialogue Agent for Incrementally Learning Visually Grounded Word Meanings
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the First Workshop on Language Grounding for Robotics

We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users, and achieve good learning performance (i.e. accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus – a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.

pdf
Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction
Jekaterina Novikova | Christian Dondrup | Ioannis Papaioannou | Oliver Lemon
Proceedings of the First Workshop on Language Grounding for Robotics

Recognition of social signals, coming from human facial expressions or prosody of human speech, is a popular research topic in human-robot interaction studies. There is also a long line of research in the spoken dialogue community that investigates user satisfaction in relation to dialogue characteristics. However, very little research relates a combination of multimodal social signals and language features detected during spoken face-to-face human-robot interaction to the resulting user perception of a robot. In this paper we show how different emotional facial expressions of human users, in combination with prosodic characteristics of human speech and features of human-robot dialogue, correlate with users’ impressions of the robot after a conversation. We find that happiness in the user’s recognised facial expression strongly correlates with likeability of a robot, while dialogue-related features (such as number of human turns or number of sentences per robot utterance) correlate with perceiving a robot as intelligent. In addition, we show that the facial expression emotional features and prosody are better predictors of human ratings related to perceived robot likeability and anthropomorphism, while linguistic and non-linguistic features more often predict perceived robot intelligence and interpretability. As such, these characteristics may in future be used as an online reward signal for in-situ Reinforcement Learning-based adaptive human-robot dialogue systems.

pdf
VOILA: An Optimised Dialogue System for Interactively Learning Visually-Grounded Word Meanings (Demonstration System)
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We present VOILA: an optimised, multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human user. VOILA is: (1) able to learn new visual categories interactively from users from scratch; (2) trained on real human-human dialogues in the same domain, and so is able to conduct natural spontaneous dialogue; (3) optimised to find the most effective trade-off between the accuracy of the visual categories it learns and the cost it incurs to users. VOILA is deployed on Furhat, a human-like, multi-modal robot head with back-projection of the face, and a graphical virtual character.

2016

pdf
Interactively Learning Visually Grounded Word Meanings from a Human Tutor
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the 5th Workshop on Vision and Language

pdf
Training an adaptive dialogue policy for interactive learning of visually grounded word meanings
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Incremental Generation of Visually Grounded Language in Situated Dialogue (demonstration system)
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the 9th International Natural Language Generation conference

pdf
Crowd-sourcing NLG Data: Pictures Elicit Better Data.
Jekaterina Novikova | Oliver Lemon | Verena Rieser
Proceedings of the 9th International Natural Language Generation conference

pdf
Natural Language Generation enhances human decision-making with uncertain information
Dimitra Gkatzia | Oliver Lemon | Verena Rieser
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf
Comparing Attribute Classifiers for Interactive Language Grounding
Yanchao Yu | Arash Eshghi | Oliver Lemon
Proceedings of the Fourth Workshop on Vision and Language

pdf
A Game-Based Setup for Data Collection and Task-Based Evaluation of Uncertain Information Presentation
Dimitra Gkatzia | Amanda Cercas Curry | Verena Rieser | Oliver Lemon
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf
Multi-threaded Interaction Management for Dynamic Spatial Applications
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf
Learning non-cooperative dialogue behaviours
Ioannis Efstathiou | Oliver Lemon
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
The PARLANCE mobile application for interactive search in English and Mandarin
Helen Hastie | Marie-Aude Aufaure | Panos Alexopoulos | Hugues Bouchard | Catherine Breslin | Heriberto Cuayáhuitl | Nina Dethlefs | Milica Gašić | James Henderson | Oliver Lemon | Xingkun Liu | Peter Mika | Nesrine Ben Mustapha | Tim Potter | Verena Rieser | Blaise Thomson | Pirros Tsiakoulis | Yves Vanrompay | Boris Villazon-Terrazas | Majid Yazdani | Steve Young | Yanchao Yu
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
Multi-adaptive Natural Language Generation using Principal Component Regression
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf
Comparing Multi-label Classification with Reinforcement Learning for Summarisation of Time-series Data
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Cluster-based Prediction of User Ratings for Stylistic Surface Realisation
Nina Dethlefs | Heriberto Cuayáhuitl | Helen Hastie | Verena Rieser | Oliver Lemon
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Finding middle ground? Multi-objective Natural Language Generation from time-series data
Dimitra Gkatzia | Helen Hastie | Oliver Lemon
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf
Adaptive Generation in Dialogue Systems Using Dynamic User Modeling
Srinivasan Janarthanam | Oliver Lemon
Computational Linguistics, Volume 40, Issue 4 - December 2014

2013

pdf
Conditional Random Fields for Responsive Surface Realisation using Global Features
Nina Dethlefs | Helen Hastie | Heriberto Cuayáhuitl | Oliver Lemon
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation
Srinivasan Janarthanam | Oliver Lemon | Phil Bartie | Tiphaine Dalmas | Anna Dickinson | Xingkun Liu | William Mackaness | Bonnie Webber
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Generating Student Feedback from Time-Series Data Using Reinforcement Learning
Dimitra Gkatzia | Helen Hastie | Srinivasan Janarthanam | Oliver Lemon
Proceedings of the 14th European Workshop on Natural Language Generation

pdf
A Multithreaded Conversational Interface for Pedestrian Navigation and Question Answering
Srinivasan Janarthanam | Oliver Lemon | Xingkun Liu | Phil Bartie | William Mackaness | Tiphaine Dalmas
Proceedings of the SIGDIAL 2013 Conference

pdf
Demonstration of the PARLANCE system: a data-driven incremental, spoken dialogue system for interactive search
Helen Hastie | Marie-Aude Aufaure | Panos Alexopoulos | Heriberto Cuayáhuitl | Nina Dethlefs | Milica Gasic | James Henderson | Oliver Lemon | Xingkun Liu | Peter Mika | Nesrine Ben Mustapha | Verena Rieser | Blaise Thomson | Pirros Tsiakoulis | Yves Vanrompay
Proceedings of the SIGDIAL 2013 Conference

pdf
Training and evaluation of an MDP model for social multi-user human-robot interaction
Simon Keizer | Mary Ellen Foster | Oliver Lemon | Andre Gaschler | Manuel Giuliani
Proceedings of the SIGDIAL 2013 Conference

pdf
Impact of ASR N-Best Information on Bayesian Dialogue Act Recognition
Heriberto Cuayáhuitl | Nina Dethlefs | Helen Hastie | Oliver Lemon
Proceedings of the SIGDIAL 2013 Conference

pdf
A Simple and Generic Belief Tracking Mechanism for the Dialog State Tracking Challenge: On the believability of observed information
Zhuoran Wang | Oliver Lemon
Proceedings of the SIGDIAL 2013 Conference

2012

pdf
A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
Srinivasan Janarthanam | Oliver Lemon | Xingkun Liu
Proceedings of the ACL 2012 System Demonstrations

pdf
Optimising Incremental Generation for Spoken Dialogue Systems: Reducing the Need for Fillers
Nina Dethlefs | Helen Hastie | Verena Rieser | Oliver Lemon
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf
Integrating Location, Visibility, and Question-Answering in a Spoken Dialogue System for Pedestrian City Exploration
Srinivasan Janarthanam | Oliver Lemon | Xingkun Liu | Phil Bartie | William Mackaness | Tiphaine Dalmas | Jana Goetze
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Incremental Spoken Dialogue Systems: Tools and Data
Helen Hastie | Oliver Lemon | Nina Dethlefs
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf
A Statistical Spoken Dialogue System using Complex User Goals and Value Directed Compression
Paul A. Crook | Zhuoran Wang | Xingkun Liu | Oliver Lemon
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems
Nina Dethlefs | Helen Hastie | Verena Rieser | Oliver Lemon
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results
Alan W Black | Susanne Burger | Alistair Conkie | Helen Hastie | Simon Keizer | Oliver Lemon | Nicolas Merigaud | Gabriel Parent | Gabriel Schubiner | Blaise Thomson | Jason D. Williams | Kai Yu | Steve Young | Maxine Eskenazi
Proceedings of the SIGDIAL 2011 Conference

pdf
“The day after the day after tomorrow?” A machine learning approach to adaptive temporal expression generation: training and evaluation with real users
Srinivasan Janarthanam | Helen Hastie | Oliver Lemon | Xingkun Liu
Proceedings of the SIGDIAL 2011 Conference

pdf bib
Talkin’ bout a revolution (statistically speaking) [Invited Talk]
Oliver Lemon
Proceedings of the 13th European Workshop on Natural Language Generation

pdf
Adaptive Information Presentation for Spoken Dialogue Systems: Evaluation with real users
Verena Rieser | Simon Keizer | Oliver Lemon | Xingkun Liu
Proceedings of the 13th European Workshop on Natural Language Generation

pdf
The GRUVE Challenge: Generating Routes under Uncertainty in Virtual Environments
Srini Janarthanam | Oliver Lemon
Proceedings of the 13th European Workshop on Natural Language Generation

pdf
Learning and Evaluation of Dialogue Strategies for New Applications: Empirical Methods for Optimization from Small Data Sets
Verena Rieser | Oliver Lemon
Computational Linguistics, Volume 37, Issue 1 - March 2011

2010

pdf
Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Optimising Information Presentation for Spoken Dialogue Systems
Verena Rieser | Oliver Lemon | Xingkun Liu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Generation Under Uncertainty
Oliver Lemon | Srini Janarthanam | Verena Rieser
Proceedings of the 6th International Natural Language Generation Conference

pdf
Adaptive Referring Expression Generation in Spoken Dialogue Systems: Evaluation with Real Users
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the SIGDIAL 2010 Conference

pdf
Representing Uncertainty about Complex User Goals in Statistical Dialogue Systems
Paul A. Crook | Oliver Lemon
Proceedings of the SIGDIAL 2010 Conference

2009

pdf
User Simulations for Context-Sensitive Speech Recognition in Spoken Dialogue Systems
Oliver Lemon | Ioannis Konstas
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf
Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems
Verena Rieser | Oliver Lemon
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf
Learning Lexical Alignment Policies for Generating Referring Expressions for Spoken Dialogue Systems
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf
A Wizard-of-Oz Environment to Study Referring Expression Generation in a Situated Spoken Dialogue Task
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf
A Two-Tier User Simulation Model for Reinforcement Learning of Adaptive Referring Expression Generation Policies
Srinivasan Janarthanam | Oliver Lemon
Proceedings of the SIGDIAL 2009 Conference

pdf
Automatic Generation of Information State Update Dialogue Systems that Dynamically Create Voice XML, as Demonstrated on the iPhone
Helen Hastie | Xingkun Liu | Oliver Lemon
Proceedings of the SIGDIAL 2009 Conference

2008

pdf
Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation
Verena Rieser | Oliver Lemon
Proceedings of ACL-08: HLT

pdf
Mixture Model POMDPs for Efficient Handling of Uncertainty in Dialogue Management
James Henderson | Oliver Lemon
Proceedings of ACL-08: HLT, Short Papers

pdf
Automatic Learning and Evaluation of User-Centered Objective Functions for Dialogue System Optimisation
Verena Rieser | Oliver Lemon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The ultimate goal when building dialogue systems is to satisfy the needs of real users, but quality assurance for dialogue strategies is a non-trivial problem. The applied evaluation metrics and resulting design principles are often obscure, emerge by trial-and-error, and are highly context dependent. This paper introduces data-driven methods for obtaining reliable objective functions for system design. In particular, we test whether an objective function obtained from Wizard-of-Oz (WOZ) data is a valid estimate of real users’ preferences. We test this in a test-retest comparison between the model obtained from the WOZ study and the models obtained when testing with real users. We can show that, despite a low fit to the initial data, the objective function obtained from WOZ data makes accurate predictions for automatic dialogue evaluation, and, when automatically optimising a policy using these predictions, the improvement over a strategy simply mimicking the data becomes clear from an error analysis.

pdf
“Build Your Own” Spoken Dialogue Systems: Automatically Generating ISU Dialogue Systems from Business User Resources
Oliver Lemon | Xingkun Liu | Helen Hastie
Coling 2008: Companion volume: Demonstrations

pdf bib
Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets
James Henderson | Oliver Lemon | Kallirroi Georgila
Computational Linguistics, Volume 34, Number 4, December 2008

2007

pdf
Dialogue Policy Learning for Combinations of Noise and User Simulation: Transfer Results
Oliver Lemon | Xingkun Liu
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2006

pdf
Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features
Matthew Frampton | Oliver Lemon
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
Using Machine Learning to Explore Human Multimodal Clarification Strategies
Verena Rieser | Oliver Lemon
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
DUDE: A Dialogue and Understanding Development Environment, Mapping Business Process Models to Information State Update Dialogue Systems
Oliver Lemon | Xingkun Liu
Demonstrations

pdf
An ISU Dialogue System Exhibiting Reinforcement Learning of Dialogue Policies: Generic Slot-Filling in the TALK In-car System
Oliver Lemon | Kallirroi Georgila | James Henderson | Matthew Stuttle
Demonstrations

pdf
Evolving optimal inspectable strategies for spoken dialogue systems
Dave Toney | Johanna Moore | Oliver Lemon
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf
A Corpus Collection and Annotation Framework for Learning Multimodal Clarification Strategies
Verena Rieser | Ivana Kruijff-Korbayová | Oliver Lemon
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

2004

pdf
Combining Acoustic and Pragmatic Features to Predict Recognition Performance in Spoken Dialogue Systems
Malte Gabsdil | Oliver Lemon
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf
Targeted Help for Spoken Dialogue Systems
Beth Ann Hockey | Oliver Lemon | Ellen Campana | Laura Hiatt | Gregory Aist | James Hieronymus | Alexander Gruenstein | John Dowding
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Managing Dialogue Interaction: A Multi-Layered Approach
Oliver Lemon | Lawrence Cavedon | Barbara Kelly
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf
DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture
Johan Bos | Ewan Klein | Oliver Lemon | Tetsushi Oka
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf
Multi-Level Architecture for Natural Activity-Oriented Dialogue
Oliver Lemon | Lawrence Cavedon
Proceedings of the 2003 EACL Workshop on Dialogue Systems: interaction, adaptation and styes of management

2002

pdf
Multi-tasking and Collaborative Activities in Dialogue Systems
Oliver Lemon | Alexander Gruenstein | Alexis Battle | Stanley Peters
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf
Probabilistic Dialogue Modelling
Oliver Lemon | Prashant Parikh | Stanley Peters
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf
Language Resources for Multi-Modal Dialogue Systems.
Oliver Lemon | Alexander Gruenstein
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Search
Co-authors