Sébastien Bronsart


2008

pdf
Odds of Successful Transfer of Low-Level Concepts: a Key Metric for Bidirectional Speech-to-Speech Machine Translation in DARPA’s TRANSTAC Program
Gregory Sanders | Sébastien Bronsart | Sherri Condon | Craig Schlenoff
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Spoken Language Communication and Translation System for Tactical Use (TRANSTAC) program is a Defense Advanced Research Agency (DARPA) program to create bidirectional speech-to-speech machine translation (MT) that will allow U.S. Soldiers and Marines, speaking only English, to communicate, in tactical situations, with civilian populations who speak only other languages (for example, Iraqi Arabic). A key metric for the program is the odds of successfully transferring low-level concepts, defined as the source-language content words. The National Institute of Standards and Technology (NIST) has now carried out two large-scale evaluations of TRANSTAC systems, using that metric. In this paper we discuss the merits of that metric. It has proven to be quite informative. We describe exactly how we defined this metric and how we obtained values for it from panels of bilingual judges allowing others to do what we have done. We compare results on this metric to results on Likert-type judgments of semantic adequacy, from the same panels of bilingual judges, as well as to a suite of typical automated MT metrics (BLEU, TER, METEOR).

pdf
Translation Adequacy and Preference Evaluation Tool (TAP-ET)
Mark Przybocki | Kay Peterson | Sébastien Bronsart
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation of Machine Translation (MT) technology is often tied to the requirement for tedious manual judgments of translation quality. While automated MT metrology continues to be an active area of research, a well known and often accepted standard metric is the manual human assessment of adequacy and fluency. There are several software packages that have been used to facilitate these judgments, but for the 2008 NIST Open MT Evaluation, NIST’s Speech Group created an online software tool to accommodate the requirement for centralized data and distributed judges. This paper introduces the NIST TAP-ET application and reviews the reasoning underlying its design. Where available, analysis of data sets judged for Adequacy and Preference using the TAP-ET application will be presented. TAP-ET is freely available and ready to download, and contains a variety of customizable features.