Chris Quirk

2022

pdf abs
Probing Factually Grounded Content Transfer with Factual Ablation
Peter West | Chris Quirk | Michel Galley | Yejin Choi
Findings of the Association for Computational Linguistics: ACL 2022

Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified–to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines.

2021

A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model’s performance.

pdf bib abs
When does text prediction benefit from additional context? An exploration of contextual signals for chat and email messages
Stojan Trajanovski | Chad Atalla | Kunho Kim | Vipul Agarwal | Milad Shokouhi | Chris Quirk
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Email and chat communication tools are increasingly important for completing daily tasks. Accurate real-time phrase completion can save time and bolster productivity. Modern text prediction algorithms are based on large language models which typically rely on the prior words in a message to predict a completion. We examine how additional contextual signals (from previous messages, time, and subject) affect the performance of a commercial text prediction model. We compare contextual text prediction in chat and email messages from two of the largest commercial platforms Microsoft Teams and Outlook, finding that contextual signals contribute to performance differently between these scenarios. On emails, time context is most beneficial with small relative gains of 2% over baseline. Whereas, in chat scenarios, using a tailored set of previous messages as context yields relative improvements over the baseline between 9.3% and 18.6% across various critical service-oriented text prediction metrics.

2020

In this paper, we detail novel strategies for interpolating personalized language models and methods to handle out-of-vocabulary (OOV) tokens to improve personalized language models. Using publicly available data from Reddit, we demonstrate improvements in offline metrics at the user level by interpolating a global LSTM-based authoring model with a user-personalized n-gram model. By optimizing this approach with a back-off to uniform OOV penalty and the interpolation coefficient, we observe that over 80% of users receive a lift in perplexity, with an average of 5.4% in perplexity lift per user. In doing this research we extend previous work in building NLIs and improve the robustness of metrics for downstream tasks.

2019

pdf abs
Multilingual Whispers: Generating Paraphrases with Translation
Christian Federmann | Oussama Elachqar | Chris Quirk
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Naturally occurring paraphrase data, such as multiple news stories about the same event, is a useful but rare resource. This paper compares translation-based paraphrase gathering using human, automatic, or hybrid techniques to monolingual paraphrasing by experts and non-experts. We gather translations, paraphrases, and empirical human quality assessments of these approaches. Neural machine translation techniques, especially when pivoting through related languages, provide a relatively robust source of paraphrases with diversity comparable to expert human paraphrases. Surprisingly, human translators do not reliably outperform neural systems. The resulting data release will not only be a useful test set, but will also allow additional explorations in translation and paraphrase quality assessments and relationships.

pdf abs
Towards Content Transfer through Grounded Text Generation
Shrimai Prabhumoye | Chris Quirk | Michel Galley
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.

2018

pdf abs
Confidence Modeling for Neural Semantic Parsing
Li Dong | Chris Quirk | Mirella Lapata
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work we focus on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models. We outline three major causes of uncertainty, and design various metrics to quantify these factors. These metrics are then used to estimate confidence scores that indicate whether model predictions are likely to be correct. Beyond confidence estimation, we identify which parts of the input contribute to uncertain predictions allowing users to interpret their model, and verify or refine its input. Experimental results show that our confidence model significantly outperforms a widely used method that relies on posterior probability, and improves the quality of interpretation compared to simply relying on attention scores.

pdf abs
Assigning people to tasks identified in email: The EPA dataset for addressee tagging for detected task intent
Revanth Rameshkumar | Peter Bailey | Abhishek Jha | Chris Quirk
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

We describe the Enron People Assignment (EPA) dataset, in which tasks that are described in emails are associated with the person(s) responsible for carrying out these tasks. We identify tasks and the responsible people in the Enron email dataset. We define evaluation methods for this challenge and report scores for our model and naïve baselines. The resulting model enables a user experience operating within a commercial email service: given a person and a task, it determines if the person should be notified of the task.

2017

pdf abs
Distant Supervision for Relation Extraction beyond the Sentence Boundary
Chris Quirk | Hoifung Poon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The growing demand for structured knowledge has led to great interest in relation extraction, especially in cases with limited supervision. However, existing distance supervision approaches only extract relations expressed in single sentences. In general, cross-sentence relation extraction is under-explored, even in the supervised-learning setting. In this paper, we propose the first approach for applying distant supervision to cross-sentence relation extraction. At the core of our approach is a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences. We extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error. Experiments on an important extraction task for precision medicine show that our approach can learn an accurate cross-sentence extractor, using only a small existing knowledge base and unlabeled text from biomedical research articles. Compared to the existing distant supervision paradigm, our approach extracted twice as many relations at similar precision, thus demonstrating the prevalence of cross-sentence relations and the promise of our approach.

pdf bib abs
NLP for Precision Medicine
Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

We will introduce precision medicine and showcase the vast opportunities for NLP in this burgeoning field with great societal impact. We will review pressing NLP problems, state-of-the art methods, and important applications, as well as datasets, medical resources, and practical issues. The tutorial will provide an accessible overview of biomedicine, and does not presume knowledge in biology or healthcare. The ultimate goal is to reduce the entry barrier for NLP researchers to contribute to this exciting domain.

pdf abs
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Nanyun Peng | Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Transactions of the Association for Computational Linguistics, Volume 5

Past work in relation extraction has focused on binary relations in single sentences. Recent NLP inroads in high-value domains have sparked interest in the more general setting of extracting n-ary relations that span multiple sentences. In this paper, we explore a general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unified way of exploring different LSTM approaches and incorporating various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations. A robust contextual representation is learned for the entities, which serves as input to the relation classifier. This simplifies handling of relations with arbitrary arity, and enables multi-task learning with related relations. We evaluate this framework in two important precision medicine settings, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Cross-sentence extraction produced larger knowledge bases. and multi-task learning significantly improved extraction accuracy. A thorough analysis of various LSTM approaches yielded useful insight the impact of linguistic analysis on extraction accuracy.

This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offline experiments, we show that the use of the expanded contextual information, including morphological and syntactic features, help better predict words in three target languages with complex morphology (Bulgarian, Czech and Korean). We also show that these improved lexical prediction models make a positive impact in the end-to-end SMT scenario from English to these languages.

pdf
Learning Phrase-Based Spelling Error Models from Clickthrough Data
Xu Sun | Jianfeng Gao | Daniel Micol | Chris Quirk
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Top-Down K-Best A* Parsing
Adam Pauls | Dan Klein | Chris Quirk
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf
Improved Smoothing for N-gram Language Models Based on Ordinary Counts
Robert C. Moore | Chris Quirk
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models
Robert C. Moore | Chris Quirk
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf abs
Discriminative, Syntactic Language Modeling through Latent SVMs
Colin Cherry | Chris Quirk
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-labeled sentences. We follow the paradigm of discriminative language modeling with pseudo-negative examples (Okanohara and Tsujii, 2007), and demonstrate significant improvements in distinguishing real sentences from pseudo-negatives. We also investigate the related task of separating machine-translation (MT) outputs from reference translations, again showing large improvements. Finally, we test our LM in MT reranking, and investigate the language-modeling parser in the context of unsupervised parsing.

pdf
Syntactic Models for Structural Word Insertion and Deletion during Translation
Arul Menezes | Chris Quirk
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang | Chris Quirk | Robert C. Moore | Daniel Gildea
Proceedings of ACL-08: HLT

pdf
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation
Robert C. Moore | Chris Quirk
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
Faster beam-search decoding for phrasal statistical machine translation
Robert C. Moore | Chris Quirk
Proceedings of Machine Translation Summit XI: Papers

pdf
Generative models of noisy translations with applications to parallel fragment extraction
Chris Quirk | Raghavendra Udupa U. | Arul Menezes
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Using Dependency Order Templates to Improve Generality in Translation
Arul Menezes | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
Robert Moore | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf
The impact of parse quality on syntactically-informed statistical machine translation
Chris Quirk | Simon Corston-Oliver
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf
Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation
Arul Menezes | Kristina Toutanova | Chris Quirk
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
Chris Quirk | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf
Microsoft Research Treelet Translation System: IWSLT Evaluation
Arul Menezes | Chris Quirk
Proceedings of the Second International Workshop on Spoken Language Translation

pdf
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
Chris Quirk | Arul Menezes | Colin Cherry
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf abs
Dependency Treelet Translation: The Convergence of Statistical and Example-based Machine-translation?
Arul Menezes | Chris Quirk
Workshop on example-based machine translation

We describe a novel approach to machine translation that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated BLEU scores with a small human evaluation.

2004

pdf
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
Bill Dolan | Chris Quirk | Chris Brockett
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Monolingual Machine Translation for Paraphrase Generation
Chris Quirk | Chris Brockett | William Dolan
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf
Statistical machine translation using labeled semantic dependency graphs
Anthony Aue | Arul Menezes | Bob Moore | Chris Quirk | Eric Ringger
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

pdf abs
Disambiguation of English PP attachment using multilingual aligned data
Lee Schwartz | Takako Aikawa | Chris Quirk
Proceedings of Machine Translation Summit IX: Papers

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguistic representations of the English and Japanese sentences from a large parallel corpus of technical texts. The premise of our approach is that with large aligned, parsed, bilingual (or multilingual) corpora, languages can learn non-trivial linguistic information from one another with high accuracy. We contend that our approach can be extended to linguistic phenomena other than PP attachment.