Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Anthology ID:
October 21-25
Waikiki, USA
Association for Machine Translation in the Americas
Bib Export formats:

pdf bib
Machine Translation for Triage and Exploitation of Massive Text Data
James E. Andrews | Kristen Summers

The National Ground Intelligence Center (NGIC) collects massive quantities of textual data in foreign languages. To support exploitation in light of intelligence requirements, a triage process must be applied to this data as those requirements emerge, to identify the most useful data for further exploitation. Machine translation provides critical support for this triage. This paper outlines the types of collected data and the different challenges they present for machine translation, as well as the types of triage to support for collections of this nature, and the issues raised for machine translation by those uses.

pdf bib
Global Public Health Intelligence Network (GPHIN)
Michael Blench

GPHIN is a secure Internet-based “early warning” system that gathers preliminary reports of public health significance on a near “real-time” basis, 24 hours a day, 7 days a week. This unique multilingual system gathers and disseminates relevant information on disease outbreaks and other public health events by monitoring global media sources such as news wires and web sites. This monitoring is done in nine languages with machine translation being used to translate non-English articles into English and English articles into the other languages. The information is filtered for relevancy by an automated process which is then complemented by human analysis. The output is categorized and made accessible to users. Notifications about public health events that may have serious public health consequences are immediately forwarded to users. GPHIN is managed by the Public Health Agency of Canada’s Centre for Emergency Preparedness and Response (CEPR), which was created in July 2000 to serve as Canada’s central coordinating point for public health security. It is considered a centre of expertise in the area of civic emergencies including natural disasters and malicious acts with health repercussions.

pdf bib
Sharing User Dictionaries Across Multiple Systems with UTX-S
Francis Bond | Seiji Okura | Yuji Yamamoto | Toshiki Murata | Kiyotaka Uchimoto | Michael Kato | Miwako Shimazu | Tsugiyoshi Suzuki

Careful tuning of user-created dictionaries is indispensable when using a machine translation system for computer aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English machine translation market. To address this issue, AAMT (the Asia-Pacific Association for Machine Translation) has established a specification of sharable dictionaries (UTX-S: Universal Terminology eXchange -- Simple), which can be used across different machine translation systems, thus increasing the interoperability of language resources. UTX-S is simpler than existing specifications such as UPF and OLIF. It was explicitly designed to make it easy to (a) add new user dictionaries and (b) share existing user dictionaries. This facilitates rapid user dictionary production and avoids vendor tie in. In this study we describe the UTX-Simple (UTX-S) format, and show that it can be converted to the user dictionary formats for five commercial English-Japanese MT systems. We then present a case study where we (a) convert an on-line glossary to UTX-S, and (b) produce user dictionaries for five different systems, and then exchange them. The results show that the simplified format of UTX-S can be used to rapidly build dictionaries. Further, we confirm that customized user dictionaries are effective across systems, although with a slight loss in quality: on average, user dictionaries improved the translations for 44.8% of translations with the systems they were built for and 37.3% of translations for different systems. In ongoing work, AAMT is using UTX-S as the format in building up a user community for producing, sharing, and accumulating user dictionaries in a sustainable way.

Many-to-Many Multilingual Medical Speech Translation on a PDA
Kyoko Kanzaki | Yukie Nakao | Manny Rayner | Marianne Santaholma | Marianne Starlander | Nikos Tsourakis

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.

MT errors in Chinese-to-English MT systems: user feedback
Shin Chang-Meadows

Working with the US Government: Information Resources
Jennifer DeCamp

This document provides information on how companies and researchers in machine translation can work with the U.S. Government. Specifically, it addresses information on (1) groups in the U.S. Government working with translation and potentially having a need for machine translation; (2) means for companies and researchers to provide information to the United States Government about their work; and (3) U.S. Government organizations providing grants of possible interest to this community.

Reliable Innovation: A Tecchie’s Travels in the Land of Translators
Alain Désilets | Louise Brunette | Christiane Melançon | Geneviève Patenaude

Machine Translation (MT) is rapidly progressing towards quality levels that might make it appropriate for broad user populations in a range of scenarios, including gisting and post-editing in unconstrained domains. For this to happen, the field may however need to switch gear and move away from its current technology driven paradigm to a more user-centered approach. In this paper, we discuss how ethnographic techniques like Contextual Inquiry could help in that respect, by providing researchers and developers with rich information about the world and needs of potential end-users. We discuss how data from Contextual Inquiries with professional translators was used to concretely and positively influence several research and development projects in the area of Computer Assisted Translation technology. These inquiries had many benefits, including: (i) grounding developers and researchers in the world of their end-users, (ii) generating new technology ideas, (iii) selecting between competing development project ideas, (iv) finding how to alleviate friction for important ideas that go against the grain of current user practices, (v) evaluating existing or experimental technologies, (vi) helping with micro level design decision, (vii) building credibility with translators, and (viii) fostering multidisciplinary discussion between researchers.

Automated Machine Translation Improvement Through Post-Editing Techniques: Analyst and Translator Experiments
Jennifer Doyon | Christine Doran | C. Donald Means | Domenique Parr

From the Automatic Language Processing Advisory Committee (ALP AC) (Pierce et al., 1966) machine translation (MT) evaluations of the ‘60s to the Defense Advanced Research Projects Agency (DARPA) Global Autonomous Language Exploitation (GALE) (Olive, 2008) and National Institute of Standards and Technology (NIST) (NIST, 2008) MT evaluations of today, the U.S. Government has been instrumental in establishing measurements and baselines for the state-of-the-art in MT engines. In the same vein, the Automated Machine Translation Improvement Through Post-Editing Techniques (PEMT) project sought to establish a baseline of MT engines based on the perceptions of potential users. In contrast to these previous evaluations, the PEMT project’s experiments also determined the minimal quality level output needed to achieve before users found the output acceptable. Based on these findings, the PEMT team investigated using post-editing techniques to achieve this level. This paper will present experiments in which analysts and translators were asked to evaluate MT output processed with varying post-editing techniques. The results show at what level the analysts and translators find MT useful and are willing to work with it. We also establish a ranking of the types of post-edits necessary to elevate MT output to the minimal acceptance level.

User-centered MT Development and Implementation
Kathleen Egan | Francis Kubala | Allen Sears

Identifying Common Challenges for Human and Machine Translation: A Case Study from the GALE Program
Lauren Friedman | Stephanie Strassel

The dramatic improvements shown by statistical machine translation systems in recent years clearly demonstrate the benefits of having large quantities of manually translated parallel text for system training and development. And while many competing evaluation metrics exist to evaluate MT technology, most of those methods also crucially rely on the existence of one or more high quality human translations to benchmark system performance. Given the importance of human translations in this framework, understanding the particular challenges of human translation-for-MT is key, as is comprehending the relative strengths and weaknesses of human versus machine translators in the context of an MT evaluation. Vanni (2000) argued that the metric used for evaluation of competence in human language learners may be applicable to MT evaluation; we apply similar thinking to improve the prediction of MT performance, which is currently unreliable. In the current paper we explore an alternate model based upon a set of genre-defining features that prove to be consistently challenging for both humans and MT systems.

Automatic Translation of Court Judgments
Fabrizio Gotti | Guy Lapalme | Elliott Macklovitch | Atefeh Farzindar

This document presents an experiment in the automatic translation of Canadian Court judgments from English to French and from French to English. We show that although the language used in this type of legal text is complex and specialized, an SMT system can produce intelligible and useful translations, provided that the system can be trained on a vast amount of legal text. We also describe the results of a human evaluation of the output of the system.

Designing and executing MT workflows through the Kepler Framework
Reginald Hobbs | Clare Voss

ClipperRSS: A Light-Weight Prototype for the Cross-language Exploitation of Syndicated Feeds
Rod Holland | Brenden Keyes

Syndicated feeds in RSS, Atom, and related formats have emerged as ubiquitous information sources in World Wide Web language communities including Arabic, Farsi, Chinese, and others, providing subscribers with timely updates on topics of particular interest. We have modified an existing Open Source RSS reader, Sage, for cross-language use, permitting English-speakers to discover, subscribe to, update, and browse RSS feeds in ten languages. This early prototype, called Clip- perRSS, has been integrated with the Clipper cross-language information retrieval tool. The integrated system provides English-speakers with an effective means of exploring the potential of foreign-language syndicated feeds in their domains of interest.

Trends in automated translation in today’s global business
Sophie Hurst

SDL, in association with the International Association for Machine Translation (IAMT) and Association for Machine Translation Americas (AMTA), ran a survey which was completed by over 385 individuals in global businesses. The results were fascinating and definitely show an increased interest in the use of automated translation over the last two years.

Machine Translation for Indonesian and Tagalog
Brianna Laugher | Ben MacLeod

Kataku is a hybrid MT system for Indonesian to English and English to Indonesian translation, available on Windows, Linux and web-based platforms. This paper briefly presents the technical background to Kataku, some of its use cases and extensions. Kataku is the flagship product of ToggleText, a language technology company based in Melbourne, Australia.

Real-time translation of IM Chat
Robert Levin

TransSearch: What are translators looking for?
Elliott Macklovitch | Guy Lapalme | Fabrizio Gotti

Notwithstanding machine translation’s impressive progress over the last decade, many translators remain convinced that the output of even the best MT systems is not sufficient to facilitate the production of publication-quality texts. To increase their productivity they turn instead to translator support tools. We examine the use of one such tool: TransSearch, an online bilingual concordancer. From the millions of requests stored in the system’s logs over a 6-year period, we extracted and analyzed the most frequently submitted queries, in an effort to characterize the kinds of problems for which translators turn to this system for help. What we discover, somewhat surprisingly, is that our system seems particularly well-suited to help translate highly polysemous adverbials and prepositional phrases.

Language Translation Solutions for Community Content
Daniel Marcu

The Use of Machine-generated Transcripts during Human Translation
Allison L. Powell | Allison Blodgett

At the request of the USG National Virtual Translation Center, the University of Maryland Center for Advanced Study of Language conducted a study that assessed the role of several factors mediating transcript usefulness during translation tasks. These factors included source language (Mandarin or Modern Standard Arabic), native speaker status of the translators, transcript quality (low or moderate word error rate), and transcript functionality (static or dynamic). Using 54 Mandarin and 54 Arabic translators (half native speakers in each language) and broadcast news clips for input, the study demonstrated that translation environments that provide dynamic transcripts with low or moderate word error rates are likely to improve performance (measured as integrated speed and accuracy scores) among non-native speakers without decreasing performance among native speakers.

Meeting Army Foreign language Requirements with the Aid of Machine Translation
Cecil MacPherson | Devin Rollis | Irene Zehmisch

The United States Army has a wide range of language requirements, varying greatly in both the number of requisite languages, and the complexity of the tasks for which language translation is crucial. Machine language translation will be an important part of the support needed to translate documents, monitor news media, and engage non-English speakers in conversation. The machine language translation community has made significant advances in the technology over the past several years, and the Army is looking to both support research and development, and to capitalize on the technology to improve communication and save lives. The Army Language Requirements Branch and the Sequoyah Program Office have received several requests from language technology developers for information on the direction and end-state goals of the Sequoyah program. In this paper, we will attempt to describe the Army’s language needs and to document requirements and goals for a machine language translation program.

Hybrid Machine Translation Applied to Media Monitoring
Hassan Sawaf | Braddock Gaskill | Michael Veronis

In this paper, a system is presented that recognizes spoken utterances in Arabic Dialects which are translated into text in English. The input is recorded from a broadcast channel and recognized using automatic speech recognition that recognize Modern Standard Arabic and Iraqi Colloquial Arabic. The recognized utterances are normalized into Modern Standard Arabic and the output of this Modern Standard Arabic interlingua is then translated by a hybrid machine translation system, combining statistical and rule-based features.

Artificial Cognitive MT Post-Editing Intelligence
Jörg Schütz

Post-editing (PE) is a necessary process in every MT deployment environment. The compe­tences needed for PE are traditionally seen as a subset of a human translator's competence. Meanwhile, some companies are accepting that the PE process involves self-standing linguistic tasks, which need their own training efforts and appropriate software tool support. To date, we still lack recorded qualitatively and quantitatively PE user-activity data that adequately describe the tasks and in particular the human cognitive processes accomplished. This data is needed to effectively model, de­sign and implement supportive software sys­tems which, on the one hand, efficiently guide the human post-editor and enhance her cogni­tive capabilities, and on the other hand, have a certain influence on the translation perfor­mance and competence of the employed MT system. In this paper we argue for a frame­work of practices to describe the PE process by correlating data obtained in laboratory ex­periments and augmented by additional data from different resources such as interviews and mathematical prediction models with the tasks fulfilled, and to model the identified pro­cess in a multi-facetted fashion as a basis for the implementation of a human PE-aware in­teractive software system.

Language Processing for Analysis and Investigation
Kristen Summers | Diane Chandler

This paper describes an operational case and document management and exploitation system, GlobalView, that includes Machine Translation (MT) for use as an aid to human effort in analysis and investigation. It also presents the REFLEX platform for experimenting with language processing tools.

Embedding Technology at the front end of the Human Translation Workflow: An NVTC Vision
Carol van Ess-Dykema | Helen G. Gigley | Stephen Lewis | Emily Vancho Bannister

This paper describes the strategic vision for a new translation management workflow for the US Government’s National Virtual Translation Center (NVTC). The paper also describes past, current, and planned experiments validating the vision, along with experiment results to-date. The most salient features of the new workflow include the embedding of translation technology at the front end of the workflow (e.g., translation memory technology, specialized lexicons, and machine translation), technology-generated “seed translation”, a new human work role called “paralinguist” to assess the “seed translation” and assign an appropriate translator/post-editor, and new human translation strategies including federated search of online dictionaries and collaborative translation.

Applicability of Resource-based Machine Translation to Airplane Manuals
Eiko Yamamoto | Akira Terada | Hitoshi Isahara

Machine translation (MT) has been studied and developed since the advent of computers, and yet is rarely used in actual business. For business use, rule-based MT has been developed, but it requires rules and a domain-specific dictionary that have been created manually. On the other hand, as huge amounts of text data have become available, corpus-based MT has been actively studied, particularly corpus-based statistical machine translation (SMT). In this study, we tested and verified the usefulness of SMT for aviation manuals. Manuals tend to be similar and repetitive, so SMT is powerful even with a small amount of training data. Although our experiments with SMT are at the preliminary stage, the BLEU score is high. SMT appears to be a powerful and promising technique in this domain.

Applications of MT during Olympic Games 2008
Chengqing Zong | Heyan Huang | Shuming Shi