Chris Quirk


2022

pdf
Probing Factually Grounded Content Transfer with Factual Ablation
Peter West | Chris Quirk | Michel Galley | Yejin Choi
Findings of the Association for Computational Linguistics: ACL 2022

Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality–it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified–to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines.

2021

pdf
Text Editing by Command
Felix Faltings | Michel Galley | Gerold Hintz | Chris Brockett | Chris Quirk | Jianfeng Gao | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model’s performance.

pdf bib
When does text prediction benefit from additional context? An exploration of contextual signals for chat and email messages
Stojan Trajanovski | Chad Atalla | Kunho Kim | Vipul Agarwal | Milad Shokouhi | Chris Quirk
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Email and chat communication tools are increasingly important for completing daily tasks. Accurate real-time phrase completion can save time and bolster productivity. Modern text prediction algorithms are based on large language models which typically rely on the prior words in a message to predict a completion. We examine how additional contextual signals (from previous messages, time, and subject) affect the performance of a commercial text prediction model. We compare contextual text prediction in chat and email messages from two of the largest commercial platforms Microsoft Teams and Outlook, finding that contextual signals contribute to performance differently between these scenarios. On emails, time context is most beneficial with small relative gains of 2% over baseline. Whereas, in chat scenarios, using a tailored set of previous messages as context yields relative improvements over the baseline between 9.3% and 18.6% across various critical service-oriented text prediction metrics.

2020

pdf
Examination and Extension of Strategies for Improving Personalized Language Modeling via Interpolation
Liqun Shao | Sahitya Mantravadi | Tom Manzini | Alejandro Buendia | Manon Knoertzer | Soundar Srinivasan | Chris Quirk
Proceedings of the First Workshop on Natural Language Interfaces

In this paper, we detail novel strategies for interpolating personalized language models and methods to handle out-of-vocabulary (OOV) tokens to improve personalized language models. Using publicly available data from Reddit, we demonstrate improvements in offline metrics at the user level by interpolating a global LSTM-based authoring model with a user-personalized n-gram model. By optimizing this approach with a back-off to uniform OOV penalty and the interpolation coefficient, we observe that over 80% of users receive a lift in perplexity, with an average of 5.4% in perplexity lift per user. In doing this research we extend previous work in building NLIs and improve the robustness of metrics for downstream tasks.

2019

pdf
Multilingual Whispers: Generating Paraphrases with Translation
Christian Federmann | Oussama Elachqar | Chris Quirk
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Naturally occurring paraphrase data, such as multiple news stories about the same event, is a useful but rare resource. This paper compares translation-based paraphrase gathering using human, automatic, or hybrid techniques to monolingual paraphrasing by experts and non-experts. We gather translations, paraphrases, and empirical human quality assessments of these approaches. Neural machine translation techniques, especially when pivoting through related languages, provide a relatively robust source of paraphrases with diversity comparable to expert human paraphrases. Surprisingly, human translators do not reliably outperform neural systems. The resulting data release will not only be a useful test set, but will also allow additional explorations in translation and paraphrase quality assessments and relationships.

pdf
Towards Content Transfer through Grounded Text Generation
Shrimai Prabhumoye | Chris Quirk | Michel Galley
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.

pdf
Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Vighnesh Leonardo Shiv | Chris Quirk | Anshuman Suri | Xiang Gao | Khuram Shahid | Nithya Govindarajan | Yizhe Zhang | Jianfeng Gao | Michel Galley | Chris Brockett | Tulasi Menon | Bill Dolan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.

2018

pdf
Assigning people to tasks identified in email: The EPA dataset for addressee tagging for detected task intent
Revanth Rameshkumar | Peter Bailey | Abhishek Jha | Chris Quirk
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

We describe the Enron People Assignment (EPA) dataset, in which tasks that are described in emails are associated with the person(s) responsible for carrying out these tasks. We identify tasks and the responsible people in the Enron email dataset. We define evaluation methods for this challenge and report scores for our model and naïve baselines. The resulting model enables a user experience operating within a commercial email service: given a person and a task, it determines if the person should be notified of the task.

pdf
Confidence Modeling for Neural Semantic Parsing
Li Dong | Chris Quirk | Mirella Lapata
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work we focus on confidence modeling for neural semantic parsers which are built upon sequence-to-sequence models. We outline three major causes of uncertainty, and design various metrics to quantify these factors. These metrics are then used to estimate confidence scores that indicate whether model predictions are likely to be correct. Beyond confidence estimation, we identify which parts of the input contribute to uncertain predictions allowing users to interpret their model, and verify or refine its input. Experimental results show that our confidence model significantly outperforms a widely used method that relies on posterior probability, and improves the quality of interpretation compared to simply relying on attention scores.

2017

pdf bib
NLP for Precision Medicine
Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

We will introduce precision medicine and showcase the vast opportunities for NLP in this burgeoning field with great societal impact. We will review pressing NLP problems, state-of-the art methods, and important applications, as well as datasets, medical resources, and practical issues. The tutorial will provide an accessible overview of biomedicine, and does not presume knowledge in biology or healthcare. The ultimate goal is to reduce the entry barrier for NLP researchers to contribute to this exciting domain.

pdf
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Nanyun Peng | Hoifung Poon | Chris Quirk | Kristina Toutanova | Wen-tau Yih
Transactions of the Association for Computational Linguistics, Volume 5

Past work in relation extraction has focused on binary relations in single sentences. Recent NLP inroads in high-value domains have sparked interest in the more general setting of extracting n-ary relations that span multiple sentences. In this paper, we explore a general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unified way of exploring different LSTM approaches and incorporating various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations. A robust contextual representation is learned for the entities, which serves as input to the relation classifier. This simplifies handling of relations with arbitrary arity, and enables multi-task learning with related relations. We evaluate this framework in two important precision medicine settings, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Cross-sentence extraction produced larger knowledge bases. and multi-task learning significantly improved extraction accuracy. A thorough analysis of various LSTM approaches yielded useful insight the impact of linguistic analysis on extraction accuracy.

pdf
Distant Supervision for Relation Extraction beyond the Sentence Boundary
Chris Quirk | Hoifung Poon
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The growing demand for structured knowledge has led to great interest in relation extraction, especially in cases with limited supervision. However, existing distance supervision approaches only extract relations expressed in single sentences. In general, cross-sentence relation extraction is under-explored, even in the supervised-learning setting. In this paper, we propose the first approach for applying distant supervision to cross-sentence relation extraction. At the core of our approach is a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences. We extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error. Experiments on an important extraction task for precision medicine show that our approach can learn an accurate cross-sentence extractor, using only a small existing knowledge base and unlabeled text from biomedical research articles. Compared to the existing distant supervision paradigm, our approach extracted twice as many relations at similar precision, thus demonstrating the prevalence of cross-sentence relations and the promise of our approach.

2016

pdf
Improved Semantic Parsers For If-Then Statements
I. Beltagy | Chris Quirk
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text
Kristina Toutanova | Victoria Lin | Wen-tau Yih | Hoifung Poon | Chris Quirk
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf
A Discriminative Model for Semantics-to-String Translation
Aleš Tamchyna | Chris Quirk | Michel Galley
Proceedings of the 1st Workshop on Semantics-Driven Statistical Machine Translation (S2MT 2015)

pdf
An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus
Lucy Vanderwende | Arul Menezes | Chris Quirk
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf
Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes
Chris Quirk | Raymond Mooney | Michel Galley
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Michel Galley | Chris Brockett | Alessandro Sordoni | Yangfeng Ji | Michael Auli | Chris Quirk | Margaret Mitchell | Jianfeng Gao | Bill Dolan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Pre-Computable Multi-Layer Neural Network Language Models
Jacob Devlin | Chris Quirk | Arul Menezes
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf
Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Avneesh Saluja | Hany Hassan | Kristina Toutanova | Chris Quirk
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Joint Language and Translation Modeling with Recurrent Neural Networks
Michael Auli | Michel Galley | Chris Quirk | Geoffrey Zweig
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Monolingual Marginal Matching for Translation Model Adaptation
Ann Irvine | Chris Quirk | Hal Daumé III
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Regularized Minimum Error Rate Training
Michel Galley | Chris Quirk | Colin Cherry | Kristina Toutanova
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge
William Lewis | Chris Quirk
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
Beyond Left-to-Right: Multiple Decomposition Structures for SMT
Hui Zhang | Kristina Toutanova | Chris Quirk | Jianfeng Gao
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Morphological, Syntactical and Semantic Knowledge in Statistical Machine Translation
Marta Ruiz Costa-jussà | Chris Quirk
NAACL HLT 2013 Tutorial Abstracts

pdf
Lightly Supervised Learning of Procedural Dialog Systems
Svitlana Volkova | Pallavi Choudhury | Chris Quirk | Bill Dolan | Luke Zettlemoyer
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Exact Maximum Inference for the Fertility Hidden Markov Model
Chris Quirk
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Semantic Neighborhoods as Hypergraphs
Chris Quirk | Pallavi Choudhury
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Domain Adaptation in Machine Translation: Findings from the 2012 Johns Hopkins Summer Workshop
Hal Daumé III | Marine Carpuat | Alex Fraser | Chris Quirk
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Keynote Presentations

pdf
MSR SPLAT, a language analysis toolkit
Chris Quirk | Pallavi Choudhury | Jianfeng Gao | Hisami Suzuki | Kristina Toutanova | Michael Gamon | Wen-tau Yih | Colin Cherry | Lucy Vanderwende
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Book Review: Linguistic Structure Prediction by Noah A. Smith
Chris Quirk
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf
On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding
Colin Cherry | Robert C. Moore | Chris Quirk
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf
Leave-One-Out Phrase Model Training for Large-Scale Deployment
Joern Wuebker | Mei-Yuh Hwang | Chris Quirk
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf
MSR-NLP Entry in BioNLP Shared Task 2011
Chris Quirk | Pallavi Choudhury | Michael Gamon | Lucy Vanderwende
Proceedings of BioNLP Shared Task 2011 Workshop

pdf
From pecher to pêcher... or pécher: Simplifying French Input by Accent Prediction
Pallavi Choudhury | Chris Quirk | Hisami Suzuki
Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)

pdf
Optimal Search for Minimum Error Rate Training
Michel Galley | Chris Quirk
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Gappy Phrasal Alignment By Agreement
Mohit Bansal | Chris Quirk | Robert Moore
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Incremental Training and Intentional Over-fitting of Word Alignment
Qin Gao | Will Lewis | Chris Quirk | Mei-Yuh Hwang
Proceedings of Machine Translation Summit XIII: Papers

pdf
MT Detection in Web-Scraped Parallel Corpora
Spencer Rarrick | Chris Quirk | Will Lewis
Proceedings of Machine Translation Summit XIII: Papers

pdf
On the Expressivity of Linear Transductions
Markus Saers | Dekai Wu | Chris Quirk
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf
A Large Scale Ranker-Based System for Search Query Spelling Correction
Jianfeng Gao | Xiaolong Li | Daniel Micol | Chris Quirk | Xu Sun
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
Jason R. Smith | Chris Quirk | Kristina Toutanova
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Learning Phrase-Based Spelling Error Models from Clickthrough Data
Xu Sun | Jianfeng Gao | Daniel Micol | Chris Quirk
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Top-Down K-Best A* Parsing
Adam Pauls | Dan Klein | Chris Quirk
Proceedings of the ACL 2010 Conference Short Papers

pdf
A Discriminative Lexicon Model for Complex Morphology
Minwoo Jeong | Kristina Toutanova | Hisami Suzuki | Chris Quirk
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offline experiments, we show that the use of the expanded contextual information, including morphological and syntactic features, help better predict words in three target languages with complex morphology (Bulgarian, Czech and Korean). We also show that these improved lexical prediction models make a positive impact in the end-to-end SMT scenario from English to these languages.

2009

pdf
Improved Smoothing for N-gram Language Models Based on Ordinary Counts
Robert C. Moore | Chris Quirk
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
Less is More: Significance-Based N-gram Selection for Smaller, Better Language Models
Robert C. Moore | Chris Quirk
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Random Restarts in Minimum Error Rate Training for Statistical Machine Translation
Robert C. Moore | Chris Quirk
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang | Chris Quirk | Robert C. Moore | Daniel Gildea
Proceedings of ACL-08: HLT

pdf
Discriminative, Syntactic Language Modeling through Latent SVMs
Colin Cherry | Chris Quirk
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-labeled sentences. We follow the paradigm of discriminative language modeling with pseudo-negative examples (Okanohara and Tsujii, 2007), and demonstrate significant improvements in distinguishing real sentences from pseudo-negatives. We also investigate the related task of separating machine-translation (MT) outputs from reference translations, again showing large improvements. Finally, we test our LM in MT reranking, and investigate the language-modeling parser in the context of unsupervised parsing.

pdf
Syntactic Models for Structural Word Insertion and Deletion during Translation
Arul Menezes | Chris Quirk
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Using Dependency Order Templates to Improve Generality in Translation
Arul Menezes | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
Robert Moore | Chris Quirk
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Faster beam-search decoding for phrasal statistical machine translation
Robert C. Moore | Chris Quirk
Proceedings of Machine Translation Summit XI: Papers

pdf
Generative models of noisy translations with applications to parallel fragment extraction
Chris Quirk | Raghavendra Udupa U. | Arul Menezes
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib
Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation
Chris Quirk | Arul Menezes
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
The impact of parse quality on syntactically-informed statistical machine translation
Chris Quirk | Simon Corston-Oliver
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf
Microsoft Research Treelet Translation System: NAACL 2006 Europarl Evaluation
Arul Menezes | Kristina Toutanova | Chris Quirk
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf
Dependency Treelet Translation: The Convergence of Statistical and Example-based Machine-translation?
Arul Menezes | Chris Quirk
Workshop on example-based machine translation

We describe a novel approach to machine translation that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated BLEU scores with a small human evaluation.

pdf
Microsoft Research Treelet Translation System: IWSLT Evaluation
Arul Menezes | Chris Quirk
Proceedings of the Second International Workshop on Spoken Language Translation

pdf
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
Chris Quirk | Arul Menezes | Colin Cherry
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
Bill Dolan | Chris Quirk | Chris Brockett
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Statistical machine translation using labeled semantic dependency graphs
Anthony Aue | Arul Menezes | Bob Moore | Chris Quirk | Eric Ringger
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

pdf
Monolingual Machine Translation for Paraphrase Generation
Chris Quirk | Chris Brockett | William Dolan
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf
Disambiguation of English PP attachment using multilingual aligned data
Lee Schwartz | Takako Aikawa | Chris Quirk
Proceedings of Machine Translation Summit IX: Papers

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguistic representations of the English and Japanese sentences from a large parallel corpus of technical texts. The premise of our approach is that with large aligned, parsed, bilingual (or multilingual) corpora, languages can learn non-trivial linguistic information from one another with high accuracy. We contend that our approach can be extended to linguistic phenomena other than PP attachment.

2002

pdf
English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations
Chris Brockett | Takako Aikawa | Anthony Aue | Arul Menezes | Chris Quirk | Hisami Suzuki
COLING-02: Machine Translation in Asia

Search