Vamshi Ambati

2011

pdf
Active Learning with Multiple Annotations for Comparable Data Classification Task
Vamshi Ambati | Sanjika Hewavitharana | Stephan Vogel | Jaime Carbonell
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf
CMU Haitian Creole-English Translation System for WMT 2011
Sanjika Hewavitharana | Nguyen Bach | Qin Gao | Vamshi Ambati | Stephan Vogel
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf
Multi-Strategy Approaches to Active Learning for Statistical Machine Translation
Vamshi Ambati | Stephan Vogel | Jaime Carbonell
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
Active Semi-Supervised Learning for Improving Word Alignment
Vamshi Ambati | Stephan Vogel | Jaime Carbonell
Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing

pdf
Can Crowds Build parallel corpora for Machine Translation Systems?
Vamshi Ambati | Stephan Vogel
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf
Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Vamshi Ambati | Stephan Vogel | Jaime Carbonell
Proceedings of the ACL 2010 Conference Short Papers

pdf abs
Active Learning and Crowd-Sourcing for Machine Translation
Vamshi Ambati | Stephan Vogel | Jaime Carbonell
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Large scale parallel data generation for new language pairs requires intensive human effort and availability of experts. It becomes immensely difficult and costly to provide Statistical Machine Translation (SMT) systems for most languages due to the paucity of expert translators to provide parallel data. Even if experts are present, it appears infeasible due to the impending costs. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

2009

pdf
An Improved Statistical Transfer System for French-English Machine Translation
Greg Hanneman | Vamshi Ambati | Jonathan H. Clark | Alok Parlikar | Alon Lavie
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
Proactive Learning for Building Machine Translation Systems for Minority Languages
Vamshi Ambati | Jaime Carbonell
Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing

pdf bib
Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Vamshi Ambati | Alon Lavie | Jaime Carbonell
Proceedings of Machine Translation Summit XII: Posters

2008

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.

pdf bib abs
Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures
Vamshi Ambati | Alon Lavie
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Student Research Workshop

Syntax-based approaches to statistical MT require syntax-aware methods for acquiring their underlying translation models from parallel data. This acquisition process can be driven by syntactic trees for either the source or target language, or by trees on both sides. Work to date has demonstrated that using trees for both sides suffers from severe coverage problems. This is primarily due to the highly restrictive space of constituent segmentations that the trees on two sides introduce, which adversely affects the recall of the resulting translation models. Approaches that project from trees on one side, on the other hand, have higher levels of recall, but suffer from lower precision, due to the lack of syntactically-aware word alignments. In this paper we explore the issue of lexical coverage of the translation models learned in both of these scenarios. We specifically look at how the non-isomorphic nature of the parse trees for the two languages affects recall and coverage. We then propose a novel technique for restructuring target parse trees, that generates highly isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We evaluate the translation models learned from these restructured trees and show that they are significantly better than those learned using trees on both sides and trees on one side.