Lauren Friedman


2008

pdf
New Resources for Document Classification, Analysis and Translation Technologies
Stephanie Strassel | Lauren Friedman | Safa Ismael | Linda Brandschain
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into English transcripts, for use by humans and downstream applications. The first phase the program focuses on translation of handwritten Arabic documents. Linguistic Data Consortium (LDC) is creating publicly available linguistic resources for MADCAT technologies, on a scale and richness not previously available. Corpora will consist of existing LDC corpora and data donations from MADCAT partners, plus new data collection to provide high quality material for evaluation and to address strategic gaps (for genre, dialect, image quality, etc.) in the existing resources. Training and test data properties will expand over time to encompass a wide range of topics and genres: letters, diaries, training manuals, brochures, signs, ledgers, memos, instructions, postcards and forms among others. Data will be ground truthed, with line, word and token segmentation and zoning, and translations and word alignments will be produced for a subset. Evaluation data will be carefully selected from the available data pools and high quality references will be produced, which can be used to compare MADCAT system performance against the human-produced gold standard.

pdf
Management of Large Annotation Projects Involving Multiple Human Judges: a Case Study of GALE Machine Translation Post-editing
Meghan Lammie Glenn | Stephanie Strassel | Lauren Friedman | Haejoong Lee | Shawn Medero
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Managing large groups of human judges to perform any annotation task is a challenge. Linguistic Data Consortium coordinated the creation of manual machine translation post-editing results for the DARPA Global Autonomous Language Exploration Program. Machine translation is one of three core technology components for GALE, which includes an annual MT evaluation administered by National Institute of Standards and Technology. Among the training and test data LDC creates for the GALE program are gold standard translations for system evaluation. The GALE machine translation system evaluation metric is edit distance, measured by HTER (human translation edit rate), which calculates the minimum number of changes required for highly-trained human editors to correct MT output so that it has the same meaning as the reference translation. LDC has been responsible for overseeing the post-editing process for GALE. We describe some of the accomplishments and challenges of completing the post-editing effort, including developing a new web-based annotation workflow system, and recruiting and training human judges for the task. In addition, we suggest that the workflow system developed for post-editing could be ported efficiently to other annotation efforts.

pdf
A Quality Control Framework for Gold Standard Reference Translations: The Process and Toolkit Developed for GALE
Lauren Friedman
Proceedings of Translating and the Computer 30

pdf
Identifying Common Challenges for Human and Machine Translation: A Case Study from the GALE Program
Lauren Friedman | Stephanie Strassel
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

The dramatic improvements shown by statistical machine translation systems in recent years clearly demonstrate the benefits of having large quantities of manually translated parallel text for system training and development. And while many competing evaluation metrics exist to evaluate MT technology, most of those methods also crucially rely on the existence of one or more high quality human translations to benchmark system performance. Given the importance of human translations in this framework, understanding the particular challenges of human translation-for-MT is key, as is comprehending the relative strengths and weaknesses of human versus machine translators in the context of an MT evaluation. Vanni (2000) argued that the metric used for evaluation of competence in human language learners may be applicable to MT evaluation; we apply similar thinking to improve the prediction of MT performance, which is currently unreliable. In the current paper we explore an alternate model based upon a set of genre-defining features that prove to be consistently challenging for both humans and MT systems.

2007


Linguistic resources in support of various evaluation metrics
Christopher Cieri | Stephanie Strassel | Meghan Lammie Glenn | Lauren Friedman
Proceedings of the Workshop on Automatic procedures in MT evaluation