2014
bib
Handling entities in MT/CAT/HLT
Keith Miller
|
Linda Moreau
|
Sherri Condon
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: Tutorials
2012
pdf
abs
Producing Data for Under-Resourced Languages: A Dari-English Parallel Corpus of Multi-Genre Text
Sherri Condon
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program
In Developers producing language technology for under-resourced languages often find relatively little machine readable text for data required to train machine translation systems. Typically, the kinds of text that are most accessible for production of parallel data are news and news-related genres, yet the language that requires translation for analysts and decision-makers reflects a broad range of forms and contents. The proposed paper will describe an effort funded by the ODNI FLPO in which the Army Research Laboratory, assisted by MITRE language technology researchers, produced a Dari-English parallel corpus containing text in a variety of styles and genres that more closely resemble the kinds of documents needed by government users than do traditional news genres. The data production effort began with a survey of Dari documents catalogued in a government repository of material obtained from the field in Afghanistan. Because the documents in the repository are not available for creation of parallel corpora, the goal was to quantify the types of documents in the collection and identify their linguistic features in order to find documents that are similar. Document images were obtained from two sources: (1) the Preserving and Creating Access to Unique Afghan Records collection, an online resource produced by the University of Arizona Libraries and the Afghanistan Centre at Kabul University and (2) The University of Nebraska Arthur Paul Afghanistan Collection. For the latter, document images were obtained by camera capture of books and by selecting pdf images of microfiche records. A set of 1395 document page images was selected to provide 250,000 translated English words in 10 content domains. The images were transcribed and translated according to specifications designed to maximize the quality and usefulness of the data. The corpus will be used to create a Dari-English glossary, and an experiment will quantify improvements to Dari-English translation of multi-genre text when a generic Dari-English machine translation system is customized using the corpus. The proposed paper will present highlights from these efforts.
2010
pdf
abs
Evaluation of Machine Translation Errors in English and Iraqi Arabic
Sherri Condon
|
Dan Parvaz
|
John Aberdeen
|
Christy Doran
|
Andrew Freeman
|
Marwan Awad
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Errors in machine translations of English-Iraqi Arabic dialogues were analyzed at two different points in the systems? development using HTER methods to identify errors and human annotations to refine TER annotations. The analyses were performed on approximately 100 translations into each language from 4 translation systems collected at two annual evaluations. Although the frequencies of errors in the more mature systems were lower, the proportions of error types exhibited little change. Results include high frequencies of pronoun errors in translations to English, high frequencies of subject person inflection in translations to Iraqi Arabic, similar frequencies of word order errors in both translation directions, and very low frequencies of polarity errors. The problems with many errors can be generalized as the need to insert lexemes not present in the source or vice versa, which includes errors in multi-word expressions. Discourse context will be required to resolve some problems with deictic elements like pronouns.
2009
pdf
bib
Normalization for Automated Metrics: English and Arabic Speech Translation
Sherri Condon
|
Gregory A. Sanders
|
Dan Parvaz
|
Alan Rubenstein
|
Christy Doran
|
John Aberdeen
|
Beatrice Oshika
Proceedings of Machine Translation Summit XII: Papers
pdf
Name Matching between Roman and Chinese Scripts: Machine Complements Human
Ken Samuel
|
Alan Rubenstein
|
Sherri Condon
|
Alex Yeh
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
2008
pdf
abs
Odds of Successful Transfer of Low-Level Concepts: a Key Metric for Bidirectional Speech-to-Speech Machine Translation in DARPA’s TRANSTAC Program
Gregory Sanders
|
Sébastien Bronsart
|
Sherri Condon
|
Craig Schlenoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program is a Defense Advanced Research Agency (DARPA) program to create bidirectional speech-to-speech machine translation (MT) that will allow U.S. Soldiers and Marines, speaking only English, to communicate, in tactical situations, with civilian populations who speak only other languages (for example, Iraqi Arabic). A key metric for the program is the odds of successfully transferring low-level concepts, defined as the source-language content words. The National Institute of Standards and Technology (NIST) has now carried out two large-scale evaluations of TRANSTAC systems, using that metric. In this paper we discuss the merits of that metric. It has proven to be quite informative. We describe exactly how we defined this metric and how we obtained values for it from panels of bilingual judges allowing others to do what we have done. We compare results on this metric to results on Likert-type judgments of semantic adequacy, from the same panels of bilingual judges, as well as to a suite of typical automated MT metrics (BLEU, TER, METEOR).
pdf
abs
Applying Automated Metrics to Speech Translation Dialogs
Sherri Condon
|
Jon Phillips
|
Christy Doran
|
John Aberdeen
|
Dan Parvaz
|
Beatrice Oshika
|
Greg Sanders
|
Craig Schlenoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Over the past five years, the Defense Advanced Research Projects Agency (DARPA) has funded development of speech translation systems for tactical applications. A key component of the research program has been extensive system evaluation, with dual objectives of assessing progress overall and comparing among systems. This paper describes the methods used to obtain BLEU, TER, and METEOR scores for two-way English-Iraqi Arabic systems. We compare the scores with measures based on human judgments and demonstrate the effects of normalization operations on BLEU scores. Issues that are highlighted include the quality of test data and differential results of applying automated metrics to Arabic vs. English.
pdf
abs
Performance Evaluation of Speech Translation Systems
Brian Weiss
|
Craig Schlenoff
|
Greg Sanders
|
Michelle Steves
|
Sherri Condon
|
Jon Phillips
|
Dan Parvaz
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
One of the most challenging tasks for uniformed service personnel serving in foreign countries is effective verbal communication with the local population. To remedy this problem, several companies and academic institutions have been funded to develop machine translation systems as part of the DARPA TRANSTAC (Spoken Language Communication and Translation System for Tactical Use) program. The goal of this program is to demonstrate capabilities to rapidly develop and field free-form, two-way translation systems that would enable speakers of different languages to communicate with one another in real-world tactical situations. DARPA has mandated that each TRANSTAC technology be evaluated numerous times throughout the life of the program and has tasked the National Institute of Standards and Technology (NIST) to lead this effort. This paper describes the experimental design methodology and test procedures from the most recent evaluation, conducted in July 2007, which focused on English to/from Iraqi Arabic.
pdf
bib
Learning to Match Names Across Languages
Inderjeet Mani
|
Alex Yeh
|
Sherri Condon
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization
2006
pdf
Name Translation
Keith Miller
|
Sherri Condon
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Tutorials
pdf
Cross Linguistic Name Matching in English and Arabic
Andrew Freeman
|
Sherri Condon
|
Christopher Ackerman
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference
pdf
bib
What‘s in a Name: Current Methods, Applications, and Evaluation in Multilingual Name Search and Matching
Sherri Condon
|
Keith Miller
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
2002
pdf
Sharing Problems and Solutions for Machine Translation of Spoken and Written Interaction
Sherri Condon
|
Keith Miller
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems
1999
pdf
Measuring Conformity to Discourse Routines in Decision-Making Interactions
Sherri L. Condon
|
Claude G. Cech
|
William R. Edwards
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics