Ryan Lowe


Learning an Unreferenced Metric for Online Dialogue Evaluation
Koustuv Sinha | Prasanna Parthasarathi | Jasmine Wang | Ryan Lowe | William L. Hamilton | Joelle Pineau
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.


Seeded self-play for language learning
Abhinav Gupta | Ryan Lowe | Jakob Foerster | Douwe Kiela | Joelle Pineau
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

How can we teach artificial agents to use human language flexibly to solve problems in real-world environments? We have an example of this in nature: human babies eventually learn to use human language to solve problems, and they are taught with an adult human-in-the-loop. Unfortunately, current machine learning methods (e.g. from deep reinforcement learning) are too data inefficient to learn language in this way. An outstanding goal is finding an algorithm with a suitable ‘language learning prior’ that allows it to learn human language, while minimizing the number of on-policy human interactions. In this paper, we propose to learn such a prior in simulation using an approach we call, Learning to Learn to Communicate (L2C). Specifically, in L2C we train a meta-learning agent in simulation to interact with populations of pre-trained agents, each with their own distinct communication protocol. Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language. Our key insight is that such populations can be obtained via self-play, after pre-training agents with imitation learning on a small amount of off-policy human language data. We call this latter technique Seeded Self-Play (S2P). Our preliminary experiments show that agents trained with L2C and S2P need fewer on-policy samples to learn a compositional language in a Lewis signaling game.


Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Ryan Lowe | Michael Noseworthy | Iulian Vlad Serban | Nicolas Angelard-Gontier | Yoshua Bengio | Joelle Pineau
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality (Liu et al., 2016). Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem.We present an evaluation model (ADEM)that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model’s predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue mod-els unseen during training, an important step for automatic dialogue evaluation.

World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions
Teng Long | Emmanuel Bengio | Ryan Lowe | Jackie Chi Kit Cheung | Doina Precup
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading comprehension systems that can do the same. In this paper, we introduce a task and several models to drive progress towards this goal. In particular, we propose the task of rare entity prediction: given a web document with several entities removed, models are tasked with predicting the correct missing entities conditioned on the document context and the lexical resources. This task is challenging due to the diversity of language styles and the extremely large number of rare entities. We propose two recurrent neural network architectures which make use of external knowledge in the form of entity descriptions. Our experiments show that our hierarchical LSTM model performs significantly better at the rare entity prediction task than those that do not make use of external resources.


How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
Chia-Wei Liu | Ryan Lowe | Iulian Serban | Mike Noseworthy | Laurent Charlin | Joelle Pineau
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data
Teng Long | Ryan Lowe | Jackie Chi Kit Cheung | Doina Precup
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

On the Evaluation of Dialogue Systems with Next Utterance Classification
Ryan Lowe | Iulian Vlad Serban | Michael Noseworthy | Laurent Charlin | Joelle Pineau
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue


The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems
Ryan Lowe | Nissan Pow | Iulian Serban | Joelle Pineau
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue