Michel Galley


2023

pdf
DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization
Yu Li | Baolin Peng | Pengcheng He | Michel Galley | Zhou Yu | Jianfeng Gao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues have limitations because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one from a fine-tuned summarization model and the other from important dialogue turns. We then choose one of these pseudo summaries based on information distribution differences in different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings

2022

pdf
Probing Factually Grounded Content Transfer with Factual Ablation
Peter West | Chris Quirk | Michel Galley | Yejin Choi
Findings of the Association for Computational Linguistics: ACL 2022

Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified–to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines.

pdf
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation
Faeze Brahman | Baolin Peng | Michel Galley | Sudha Rao | Bill Dolan | Snigdha Chaturvedi | Jianfeng Gao
Findings of the Association for Computational Linguistics: EMNLP 2022

Large pre-trained language models have recently enabled open-ended generation frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond the traditional data-to-text generation. While this framework is more general, it is under-specified and often leads to a lack of controllability restricting their real-world usage. We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. To address this task, we introduce a new dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions. Our EntDescriptor model is equipped with strong rankers to fetch helpful passages and generate entity descriptions. Experimental result shows a good correlation (60.14) between our proposed metric and human judgments of factuality. Our rankers significantly improved the factual correctness of generated descriptions (15.95% and 34.51% relative gains in recall and precision). Finally, our ablation study highlights the benefit of combining keys and groundings.

2021

pdf
Automatic Document Sketching: Generating Drafts from Analogous Texts
Zeqiu Wu | Michel Galley | Chris Brockett | Yizhe Zhang | Bill Dolan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Ask what’s missing and what’s useful: Improving Clarification Question Generation using Global Knowledge
Bodhisattwa Prasad Majumder | Sudha Rao | Michel Galley | Julian McAuley
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The ability to generate clarification questions i.e., questions that identify useful missing information in a given context, is important in reducing ambiguity. Humans use previous experience with similar contexts to form a global view and compare it to the given context to ascertain what is missing and what is useful in the context. Inspired by this, we propose a model for clarification question generation where we first identify what is missing by taking a difference between the global and the local view and then train a model to identify what is useful and generate a question about it. Our model outperforms several baselines as judged by both automatic metrics and humans.

pdf
Text Editing by Command
Felix Faltings | Michel Galley | Gerold Hintz | Chris Brockett | Chris Quirk | Jianfeng Gao | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model’s performance.

2020

pdf
MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform
Xiang Gao | Michel Galley | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present MixingBoard, a platform for quickly building demos with a focus on knowledge grounded stylized text generation. We unify existing text generation algorithms in a shared codebase and further adapt earlier algorithms for constrained generation. To borrow advantages from different models, we implement strategies for cross-model integration, from the token probability level to the latent space level. An interface to external knowledge is provided via a module that retrieves, on-the-fly, relevant knowledge from passages on the web or a document collection. A user interface for local development, remote webpage access, and a RESTful API are provided to make it simple for users to build their own demos.

pdf
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang | Siqi Sun | Michel Galley | Yen-Chun Chen | Chris Brockett | Xiang Gao | Jianfeng Gao | Jingjing Liu | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.

pdf
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang Gao | Yizhe Zhang | Michel Galley | Chris Brockett | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines. Particularly, our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback. We finally combine the feedback prediction models and a human-like scoring model to rank the machine-generated dialog responses. Crowd-sourced human evaluation shows that our ranking method correlates better with real human preferences than baseline models.

2019

pdf
Structuring Latent Spaces for Stylized Response Generation
Xiang Gao | Yizhe Zhang | Sungjin Lee | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

pdf
Jointly Optimizing Diversity and Relevance in Neural Response Generation
Xiang Gao | Sungjin Lee | Yizhe Zhang | Chris Brockett | Michel Galley | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

pdf
Towards Content Transfer through Grounded Text Generation
Shrimai Prabhumoye | Chris Quirk | Michel Galley
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.

pdf
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
Lianhui Qin | Michel Galley | Chris Brockett | Xiaodong Liu | Xiang Gao | Bill Dolan | Yejin Choi | Jianfeng Gao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Although neural conversational models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous. We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. The key idea is to provide the conversation model with relevant long-form text on the fly as a source of external knowledge. The model performs QA-style reading comprehension on this text in response to each conversational turn, thereby allowing for more focused integration of external knowledge than has been possible in prior approaches. To support further research on knowledge-grounded conversation, we introduce a new large-scale conversation dataset grounded in external web pages (2.8M turns, 7.4M sentences of grounding). Both human evaluation and automated metrics show that our approach results in more contentful responses compared to a variety of previous methods, improving both the informativeness and diversity of generated output.

pdf
Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Vighnesh Leonardo Shiv | Chris Quirk | Anshuman Suri | Xiang Gao | Khuram Shahid | Nithya Govindarajan | Yizhe Zhang | Jianfeng Gao | Michel Galley | Chris Brockett | Tulasi Menon | Bill Dolan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.

pdf bib
Towards Coherent and Cohesive Long-form Text Generation
Woon Sang Cho | Pengchuan Zhang | Yizhe Zhang | Xiujun Li | Michel Galley | Chris Brockett | Mengdi Wang | Jianfeng Gao
Proceedings of the First Workshop on Narrative Understanding

Generating coherent and cohesive long-form texts is a challenging task. Previous works relied on large amounts of human-generated texts to train neural language models. However, few attempted to explicitly improve neural language models from the perspectives of coherence and cohesion. In this work, we propose a new neural language model that is equipped with two neural discriminators which provide feedback signals at the levels of sentence (cohesion) and paragraph (coherence). Our model is trained using a simple yet efficient variant of policy gradient, called ‘negative-critical sequence training’, which is proposed to eliminate the need of training a separate critic for estimating ‘baseline’. Results demonstrate the effectiveness of our approach, showing improvements over the strong baseline – recurrent attention-based bidirectional MLE-trained neural language model.

2018

pdf bib
Neural Approaches to Conversational AI
Jianfeng Gao | Michel Galley | Lihong Li
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial surveys neural approaches to conversational AI that were developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) social bots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between neural approaches and traditional symbolic approaches, and discuss the progress we have made and challenges we are facing, using specific systems and models as case studies.

2017

pdf
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Nasrin Mostafazadeh | Chris Brockett | Bill Dolan | Michel Galley | Jianfeng Gao | Georgios Spithourakis | Lucy Vanderwende
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The popularity of image sharing on social media and the engagement it creates between users reflect the important role that visual context plays in everyday conversations. We present a novel task, Image Grounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image. To benchmark progress, we introduce a new multiple reference dataset of crowd-sourced, event-centric conversations on images. IGC falls on the continuum between chit-chat and goal-directed conversation models, where visual grounding constrains the topic of conversation to event-driven utterances. Experiments with models trained on social media data show that the combination of visual and textual context enhances the quality of generated conversational turns. In human evaluation, the gap between human performance and that of both neural and retrieval architectures suggests that multi-modal IGC presents an interesting challenge for dialog research.

pdf
Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models
Yi Luan | Chris Brockett | Bill Dolan | Jianfeng Gao | Michel Galley
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training neural conversation models that leverages both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be modeled. Experiments show that our approach leads to significant improvements over baseline model quality, generating responses that capture more precisely speakers’ traits and speaking styles. The model offers the benefits of being algorithmically simple and easy to implement, and not relying on large quantities of data representing specific individual speakers.

2016

pdf
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li | Will Monroe | Alan Ritter | Dan Jurafsky | Michel Galley | Jianfeng Gao
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Visual Storytelling
Ting-Hao Kenneth Huang | Francis Ferraro | Nasrin Mostafazadeh | Ishan Misra | Aishwarya Agrawal | Jacob Devlin | Ross Girshick | Xiaodong He | Pushmeet Kohli | Dhruv Batra | C. Lawrence Zitnick | Devi Parikh | Lucy Vanderwende | Michel Galley | Margaret Mitchell
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
A Persona-Based Neural Conversation Model
Jiwei Li | Michel Galley | Chris Brockett | Georgios Spithourakis | Jianfeng Gao | Bill Dolan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
A Discriminative Model for Semantics-to-String Translation
Aleš Tamchyna | Chris Quirk | Michel Galley
Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)

pdf
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Alessandro Sordoni | Michel Galley | Michael Auli | Chris Brockett | Yangfeng Ji | Margaret Mitchell | Jian-Yun Nie | Jianfeng Gao | Bill Dolan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes
Chris Quirk | Raymond Mooney | Michel Galley
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Michel Galley | Chris Brockett | Alessandro Sordoni | Yangfeng Ji | Michael Auli | Chris Quirk | Margaret Mitchell | Jianfeng Gao | Bill Dolan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
A Survey of Current Datasets for Vision and Language Research
Francis Ferraro | Nasrin Mostafazadeh | Ting-Hao Huang | Lucy Vanderwende | Jacob Devlin | Michel Galley | Margaret Mitchell
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf
Large-scale Expected BLEU Training of Phrase-based Reordering Models
Michael Auli | Michel Galley | Jianfeng Gao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
See No Evil, Say No Evil: Description Generation from Densely Labeled Images
Mark Yatskar | Michel Galley | Lucy Vanderwende | Luke Zettlemoyer
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

2013

pdf
Joint Language and Translation Modeling with Recurrent Neural Networks
Michael Auli | Michel Galley | Chris Quirk | Geoffrey Zweig
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Regularized Minimum Error Rate Training
Michel Galley | Chris Quirk | Colin Cherry | Kristina Toutanova
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf
Direct Error Rate Minimization for Statistical Machine Translation
Tagyoung Chung | Michel Galley
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf
Optimal Search for Minimum Error Rate Training
Michel Galley | Chris Quirk
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
Kristina Toutanova | Michel Galley
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
Improved Models of Distortion Cost for Statistical Machine Translation
Spence Green | Michel Galley | Christopher D. Manning
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Accurate Non-Hierarchical Phrase-Based Translation
Michel Galley | Christopher D. Manning
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Phrasal: A Statistical Machine Translation Toolkit for Exploring New Model Features
Daniel Cer | Michel Galley | Daniel Jurafsky | Christopher D. Manning
Proceedings of the NAACL HLT 2010 Demonstration Session

2009

pdf
Robust Machine Translation Evaluation with Entailment Features
Sebastian Padó | Michel Galley | Dan Jurafsky | Christopher D. Manning
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Quadratic-Time Dependency Parsing for Machine Translation
Michel Galley | Christopher D. Manning
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Machine Translation Evaluation with Textual Entailment Features
Sebastian Padó | Michel Galley | Daniel Jurafsky | Christopher D. Manning
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf
Optimizing Chinese Word Segmentation for Machine Translation Performance
Pi-Chuan Chang | Michel Galley | Christopher D. Manning
Proceedings of the Third Workshop on Statistical Machine Translation

pdf
A Phrase-Based Alignment Model for Natural Language Inference
Bill MacCartney | Michel Galley | Christopher D. Manning
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf
A Simple and Effective Hierarchical Phrase Reordering Model
Michel Galley | Christopher D. Manning
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Lexicalized Markov Grammars for Sentence Compression
Michel Galley | Kathleen McKeown
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf
Scalable Inference and Training of Context-Rich Syntactic Translation Models
Michel Galley | Jonathan Graehl | Kevin Knight | Daniel Marcu | Steve DeNeefe | Wei Wang | Ignacio Thayer
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance
Michel Galley
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2004

pdf
What’s in a translation rule?
Michel Galley | Mark Hopkins | Kevin Knight | Daniel Marcu
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf
Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies
Michel Galley | Kathleen McKeown | Julia Hirschberg | Elizabeth Shriberg
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf
Discourse Segmentation of Multi-Party Conversation
Michel Galley | Kathleen R. McKeown | Eric Fosler-Lussier | Hongyan Jing
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics