Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech Translation
The paper presents a novel technique for speech translation using hierarchical phrased-based statistical machine translation (HPB-SMT). The system is based on translation of speech from phone sequences as opposed to conventional approach of speech translation from word sequences. The technique facilitates speech translation by allowing a machine translation (MT) system to access to phonetic information. This enables the MT system to act as both a word recognition and a translation component. This results in better performance than conventional speech translation approaches by recovering from recognition error with help of a source language model, translation model and target language model. For this purpose, the MT translation models are adopted to work on source language phones using a grapheme-to-phoneme component. The source-side phonetic confusions are handled using a confusion network. The result on IWLST'10 English- Chinese translation task shows a significant improvement in translation quality. In this paper, results for HPB-SMT are compared with previously published results of phrase-based statistical machine translation (PB-SMT) system (Baseline). The HPB-SMT system outperforms PB-SMT in this regard.
Identifying Infrequent Translations by Aligning Non Parallel Sentences
Aligning a sequence of words to one of its infrequent translations is a difficult task. We propose a simple and original solution to this problem that yields to significant gains over a state-of-the-art transpotting task. Our approach consists in aligning non parallel sentences from the training data in order to reinforce online the alignment models. We show that using only a few pairs of non parallel sentences allows to improve significantly the alignment of infrequent translations.
Sample Selection for Large-scale MT Discriminative Training
Discriminative training for MT usually involves numerous features and requires large-scale training set to reach reliable parameter estimation. Other than using the expensive human-labeled parallel corpora for training, semi-supervised methods have been proposed to generate huge amount of “hallucinated” data which relieves the data sparsity problem. However the large training set contains both good samples which are suitable for training and bad ones harmful to the training. How to select training samples from vast amount of data can greatly affect the training performance. In this paper we propose a method for selecting samples that are most suitable for discriminative training according to a criterion measuring the dataset quality. Our experimental results show that by adding samples to the training set selectively, we are able to exceed the performance of system trained with the same amount of samples selected randomly.
One System, Many Domains: Open-Domain Statistical Machine Translation via Feature Augmentation
In this paper, we introduce a simple technique for incorporating domain information into a statistical machine translation system that significantly improves translation quality when test data comes from multiple domains. Our approach augments (conjoins) standard translation model and language model features with domain indicator features and requires only minimal modifications to the optimization and decoding procedures. We evaluate our method on two language pairs with varying numbers of domains, and observe significant improvements of up to 1.0 BLEU.
Identification of Fertile Translations in Comparable Corpora: A Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate ’fertile’ translations. We show that fertile translations increase the overall quality of the extracted lexicon for English to French translation.
Challenges in Predicting Machine Translation Utility for Human Post-Editors
As machine translation quality continues to improve, the idea of using MT to assist human translators becomes increasingly attractive. In this work, we discuss and provide empirical evidence of the challenges faced when adapting traditional MT systems to provide automatic translations for human post-editors to correct. We discuss the differences between this task and traditional adequacy-based tasks and the challenges that arise when using automatic metrics to predict the amount of effort required to post-edit translations. A series of experiments simulating a real-world localization scenario shows that current metrics under-perform on this task, even when tuned to maximize correlation with expert translator judgments, illustrating the need to rethink traditional MT pipelines when addressing the challenges of this translation task.
The Impact of Sentence Alignment Errors on Phrase-Based Machine Translation Performance
When parallel or comparable corpora are harvested from the web, there is typically a tradeoff between the size and quality of the data. In order to improve quality, corpus collection efforts often attempt to fix or remove misaligned sentence pairs. But, at the same time, Statistical Machine Translation (SMT) systems are widely assumed to be relatively robust to sentence alignment errors. However, there is little empirical evidence to support and characterize this robustness. This contribution investigates the impact of sentence alignment errors on a typical phrase-based SMT system. We confirm that SMT systems are highly tolerant to noise, and that performance only degrades seriously at very high noise levels. Our findings suggest that when collecting larger, noisy parallel data for training phrase-based SMT, cleaning up by trying to detect and remove incorrect alignments can actually degrade performance. Although fixing errors, when applicable, is a preferable strategy to removal, its benefits only become apparent for fairly high misalignment rates. We provide several explanations to support these findings.
Pivot Lightly-Supervised Training for Statistical Machine Translation
In this paper, we investigate large-scale lightly-supervised training with a pivot language: We augment a baseline statistical machine translation (SMT) system that has been trained on human-generated parallel training corpora with large amounts of additional unsupervised parallel data; but instead of creating this synthetic data from monolingual source language data with the baseline system itself, or from target language data with a reverse system, we employ a parallel corpus of target language data and data in a pivot language. The pivot language data is automatically translated into the source language, resulting in a trilingual corpus with unsupervised source language side. We augment our baseline system with the unsupervised source-target parallel data. Experiments are conducted for the German-French language pair using the standard WMT newstest sets for development and testing. We obtain the unsupervised data by translating the English side of the English-French 109 corpus to German. With careful system design, we are able to achieve improvements of up to +0.4 points BLEU / -0.7 points TER over the baseline.
Interpolated Backoff for Factored Translation Models
We propose interpolated backoff methods to strike the balance between traditional surface form translation models and factored models that decompose translation into lemma and morphological feature mapping steps. We show that this approach improves translation quality by 0.5 BLEU (German–English) over phrase-based models, due to the better translation of rare nouns and adjectives.
Building MT for a Severely Under-Resourced Language: White Hmong
In this paper, we discuss the development of statistical machine translation for English to/from White Hmong (Language code: mww). White Hmong is a Hmong-Mien language, originally spoken mostly in Southeast Asia, but now predominantly spoken by a large diaspora throughout the world, with populations in the United States, Australia, France, Thailand and elsewhere. Building statistical translation systems for Hmong proved to be incredibly challenging since there are no known parallel or monolingual corpora for the language; in fact, finding data for Hmong proved to be one of the biggest challenges to getting the project off the ground. It was only through a close collaboration with the Hmong community, and active and tireless participation of Hmong speakers, that it became possible to build up a critical mass of data to make the translation project a reality. We see this effort as potentially replicable for other severely resource poor languages of the world, which is likely the case for the majority of the languages still spoken on the planet. Further, the work here suggests that research and work on other severely under-resourced languages can have significant positive impacts for the affected communities, both for accessibility and language preservation.
Phrase-level System Combination for Machine Translation Based on Target-to-Target Decoding
In this paper, we propose a novel lattice-based MT combination methodology that we call Target-to-Target Decoding (TTD). The combination process is carried out as a “translation” from backbone to the combination result. This perspective suggests the use of existing phrase-based MT techniques in the combination framework. We show how phrase extraction rules and confidence estimations inspired from machine translation improve results. We also propose system-specific LMs for estimating N-gram consensus. Our results show that our approach yields a strong improvement over the best single MT system and competes with other state-of-the-art combination systems.
Lost & Found in Translation: Impact of Machine Translated Results on Translingual Information Retrieval
In an ideal cross-lingual information retrieval (CLIR) system, a user query would generate a search over documents in a different language and the relevant results would be presented in the user’s language. In practice, CLIR systems are typically evaluated by judging result relevance in the document language, to factor out the effects of translating the results using machine translation (MT). In this paper, we investigate the influence of four different approaches for integrating MT and CLIR on both retrieval accuracy and user judgment of relevancy. We create a corpus with relevance judgments for both human and machine translated results, and use it to quantify the effect that MT quality has on end-to-end relevance. We find that MT errors result in a 16-39% decrease in mean average precision over the ground truth system that uses human translations. MT errors also caused relevant sentences to appear irrelevant – 5-19% of sentences were relevant in human translation, but were judged irrelevant in MT. To counter this degradation, we present two hybrid retrieval models and two automatic MT post-editing techniques and show that these approaches substantially mitigate the errors and improve the end-to-end relevance.
A Graph-based Strategy to Streamline Translation Quality Assessments
We present a detailed analysis of a graph-based annotation strategy that we employed to annotate a corpus of 11,292 real-world English to Spanish automatic translations with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The proposed approach, inspired by previous work in Interactive Evolutionary Computation and Interactive Genetic Algorithms, results in a simpler and faster annotation process. We empirically compare the method against a traditional, explicit ranking approach, and show that the graph-based strategy: 1) is considerably faster, and 2) produces consistently more reliable annotations.
Machine Translation with Binary Feedback: a Large-Margin Approach
Viewing machine translation as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin structured prediction methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with structured problems in general is the difficulty in obtaining fully structured labels, e.g., in machine translation, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training of machine translation systems, since existing methods often require bilingual knowledge to correct translation output online. We propose a solution to these two problems, by demonstrating a way to incorporate binary-labeled feedback (i.e., feedback on whether a translation hypothesis is a “good” or understandable one or not), a form of supervision that can be easily integrated in an online manner, into a machine translation framework. Experimental results show marked improvement by incorporating binary feedback on unseen test data, with gains exceeding 5.5 BLEU points.
HAL: Challenging Three Key Aspects of IBM-style Statistical Machine Translation
The IBM schemes use weighted cooccurrence counts to iteratively improve translation and alignment probability estimates. We argue that: 1) these cooccurrence counts should be combined differently to capture word correlation; 2) alignment probabilities adopt predictable distributions; and 3) consequently, no iteration is needed. This applies equally well to word-based and phrase-based approaches. The resulting scheme, dubbed HAL, outperforms the IBM scheme in experiments.
Compact Rule Extraction for Hierarchical Phrase-based Translation
This paper introduces two novel approaches for extracting compact grammars for hierarchical phrase-based translation. The first is a combinatorial optimization approach and the second is a Bayesian model over Hiero grammars using Variational Bayes for inference. In contrast to the conventional Hiero (Chiang, 2007) rule extraction algorithm , our methods extract compact models reducing model size by 17.8% to 57.6% without impacting translation quality across several language pairs. The Bayesian model is particularly effective for resource-poor languages with evidence from Korean-English translation. To our knowledge, this is the first alternative to Hiero-style rule extraction that finds a more compact synchronous grammar without hurting translation performance.
Non-linear n-best List Reranking with Few Features
In Machine Translation, it is customary to compute the model score of a predicted hypothesis as a linear combination of multiple features, where each feature assesses a particular facet of the hypothesis. The choice of a linear combination is usually justified by the possibility of efficient inference (decoding); yet, the appropriateness of this simple combination scheme to the task at hand is rarely questioned. In this paper, we propose an approach that replaces the linear scoring function with a non-linear scoring function. To investigate the applicability of this approach, we rescore n-best lists generated with a conventional machine translation engine (using a linear scoring function for generating its hypotheses) with a non-linear scoring function learned using the learning-to-rank framework. Moderate, though consistent, gains in BLEU are demonstrated on the WMT’10, WMT’11 and WMT’12 test sets.
Improved Domain Adaptation for Statistical Machine Translation
We present a simple and effective infrastructure for domain adaptation for statistical machine translation (MT). To build MT systems for different domains, it trains, tunes and deploys a single translation system that is capable of producing adapted domain translations and preserving the original generic accuracy at the same time. The approach unifies automatic domain detection and domain model parameterization into one system. Experiment results on 20 language pairs demonstrate its viability.
Detailed Analysis of Different Strategies for Phrase Table Adaptation in SMT
This paper gives a detailed analysis of different approaches to adapt a statistical machine translation system towards a target domain using small amounts of parallel in-domain data. Therefore, we investigate the differences between the approaches addressing adaptation on the two main steps of building a translation model: The candidate selection and the phrase scoring. For the latter step we characterized the differences by four key aspects. We performed experiments on two different tasks of speech translation and analyzed the influence of the different aspects on the overall translation quality. On both tasks we could show significant improvements by using the presented adaptation techniques.
Machine Translation of Labeled Discourse Connectives
This paper shows how the disambiguation of discourse connectives can improve their automatic translation, while preserving the overall performance of statistical MT as measured by BLEU. State-of-the-art automatic classifiers for rhetorical relations are used prior to MT to label discourse connectives that signal those relations. These labels are used for MT in two ways: (1) by augmenting factored translation models; and (2) by using the probability distributions of labels in order to train and tune SMT. The improvement of translation quality is demonstrated using a new semi-automated metric for discourse connectives, on the English/French WMT10 data, while BLEU scores remain comparable to non-discourse-aware systems, due to the low frequency of discourse connectives.
A General Framework to Weight Heterogeneous Parallel Data for Model Adaptation in Statistical MT
The standard procedure to train the translation model of a phrase-based SMT system is to concatenate all available parallel data, to perform word alignment, to extract phrase pairs and to calculate translation probabilities by simple relative frequency. However, parallel data is quite inhomogeneous in many practical applications with respect to several factors like data source, alignment quality, appropriateness to the task, etc. We propose a general framework to take into account these factors during the calculation of the phrase-table, e.g. by better distributing the probability mass of the individual phrase pairs. No additional feature functions are needed. We report results on two well-known tasks: the IWSLT’11 and WMT’11 evaluations, in both conditions translating from English to French. We give detailed results for different functions to weight the bitexts. Our best systems improve a strong baseline by up to one BLEU point without any impact on the computational complexity during training or decoding.
Measuring User Productivity in Machine Translation Enhanced Computer Assisted Translation
This paper addresses the problem of reliably measuring productivity gains by professional translators working with a machine translation enhanced computer assisted translation tool. In particular, we report on a field test we carried out with a commercial CAT tool in which translation memory matches were supplemented with suggestions from a commercial machine translation engine. The field test was conducted with 12 professional translators working on real translation projects. Productivity of translators were measured with two indicators, post-editing speed and post-editing effort, on two translation directions, English–Italian and English–German, and two linguistic domains, legal and information technology. Besides a detailed statistical analysis of the experimental results, we also discuss issues encountered in running the test.
Hybrid Machine Translation Using Joint, Binarised Feature Vectors
We present an approach for Hybrid Machine Translation, based on a Machine-Learning framework. Our method combines output from several source systems. We first define an extensible, total order on translations and use it to estimate a ranking on the sentence level for a given set of systems. We introduce and define the notion of joint, binarised feature vectors. We train an SVM-based classifier and show how its classification results can be used to create hybrid translations. We describe a series of oracle experiments on data sets from the WMT11 translation task in order to find an upper bound regarding the achievable level of translation quality. We also present results from first experiments with an implemented version of our system. Evaluation using NIST and BLEU metrics indicates that the proposed method can outperform its individual source systems. An interesting finding is that our approach allows to leverage good translations from otherwise bad systems as the translation quality estimation is based on sentence-level phenomena rather than corpus-level metrics. We conclude by summarising our findings and by giving an outlook to future work.
Using Automatic Machine Translation Metrics to Analyze the Impact of Source Reformulations
This paper investigates the usefulness of automatic machine translation metrics when analyzing the impact of source reformulations on the quality of machine-translated user generated content. We propose a novel framework to quickly identify rewriting rules which improve or degrade the quality of MT output, by trying to rely on automatic metrics rather than human judgments. We find that this approach allows us to quickly identify overlapping rules between two language pairs (English- French and English-German) and specific cases where the rules’ precision could be improved.
Using Source-Language Transformations to Address Register Mismatches in SMT
Mismatches between training and test data are a ubiquitous problem for real SMT applications. In this paper, we examine a type of mismatch that commonly arises when translating from French and similar languages: available training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface transformations that map common informal language constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to create artificial training data or to pre-process source text at run-time. An initial evaluation performed using crowd-sourced comparisons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effective of the two.
A Poor Man’s Translation Memory Using Machine Translation Evaluation Metrics
We propose straightforward implementations of translation memory (TM) functionality for research purposes, using machine translation evaluation metrics as similarity functions. Experiments under various conditions demonstrate the effectiveness of the approach, but also highlight problems in evaluating the results using an MT evaluation methodology.
A Detailed Analysis of Phrase-based and Syntax-based MT: The Search for Systematic Differences
Rasoul Samad Zadeh Kaljahi
This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging.
Conditional Significance Pruning: Discarding More of Huge Phrase Tables
The technique of pruning phrase tables that are used for statistical machine translation (SMT) can achieve substantial reductions in bulk and improve translation quality, especially for very large corpora such at the Giga-FrEn. This can be further improved by conditioning each significance test on other phrase pair co-occurrence counts resulting in an additional reduction in size and increase in BLEU score. A series of experiments using Moses and the WMT11 corpora for French to English have been performed to quantify the improvement. By adhering strictly to the recommendations for the WMT11 baseline system, a strong reproducible research baseline was employed.
Unsupervised Translation Disambiguation for Cross-Domain Statistical Machine Translation
Most attempts at integrating word sense disambiguation with statistical machine translation have focused on supervised disambiguation approaches. These approaches are of limited use when the distribution of the test data differs strongly from that of the training data; however, word sense errors tend to be especially common under these conditions. In this paper we present different approaches to unsupervised word translation disambiguation and apply them to the problem of translating conversational speech under resource-poor training conditions. Both human and automatic evaluation metrics demonstrate significant improvements resulting from our technique.
Integrating MT with Digital Collections for Multilingual Information Access
Cheng Chieh Lien
This paper describes the role of machine translation (MT) for multilingual information access, a service that is desired by digital libraries that wish to provide cross-cultural access to their collections. To understand the performance of MT, we have developed HeMT: an integrated multilingual evaluation platform (http://txcdk-v10.unt.edu/HeMT/) to facilitate human evaluation of machine translation. The results of human evaluation using HeMT on three online MT services are reported. Challenges and benefits of crowdsourcing and collaboration based on our experience are discussed. Additionally, we present the analysis of the translation errors and propose Multi-engine MT strategies to improve translation performance.
Linguists Love Art and Management Loves Efficiency – Can MT be the Solution?
How to achieve the optimal balance of quality and cost when the need for translation is sky-rocketing? Can machine translation be the solution? What system to choose? Finding the right MT solution for your organization is not easy. In this paper, we would like to share our experience at Nikon Precision Inc. in quest of the right tool, focusing on rule-based Japanese MT software and the results of a small pilot project, together with our plans for the future and the challenges we are facing.
Taking Statistical Machine Translation to the Student Translator
Despite the growth of statistical machine translation (SMT) research and development in recent years, it remains somewhat out of reach for the translation community where programming expertise and knowledge of statistics tend not to be commonplace. While the concept of SMT is relatively straightforward, its implementation in functioning systems remains difficult for most, regardless of expertise. More recently, however, developments such as SmartMATE have emerged which aim to assist users in creating their own customized SMT systems and thus reduce the learning curve associated with SMT. In addition to commercial uses, translator training stands to benefit from such increased levels of inclusion and access to state-of-the-art approaches to MT. In this paper we draw on experience in developing and evaluating a new syllabus in SMT for a cohort of post-graduate student translators: we identify several issues encountered in the introduction of student translators to SMT, and report on data derived from repeated measures questionnaires that aim to capture data on students’ self-efficacy in the use of SMT. Overall, results show that participants report significant increases in their levels of confidence and knowledge of MT in general, and of SMT in particular. Additional benefits – such as increased technical competence and confidence – and future refinements are also discussed.
A User-Based Usability Assessment of Raw Machine Translated Technical Instructions
This paper reports on a project whose aims are to investigate the usability of raw machine translated technical support documentation for a commercial online file storage service. Following the ISO/TR 16982 definition of usability - goal completion, satisfaction, effectiveness, and efficiency - comparisons are drawn for all measures between the original user documentation written in English for a well-known online file storage service and raw machine translated output in four target languages: Spanish, French, German and Japanese. Using native speakers for each language, we found significant differences between the source and MT output for three out of the four measures: goal completion, efficiency and user satisfaction. This leads to a tentative conclusion that there is a difference in usability between well-formed content and raw machine translated content, and we suggest avenues for further work.
What’s Your Pick: RbMT, SMT or Hybrid?
Ruben de la Fuente
All types of Machine Translation technologies have pros and cons. At PayPal, we have been working with MT for 3 years (2 of them in a production environment). The aim of this paper is to share our experience and discuss strengths and weaknesses for Rule-based Machine Translation, Statistical Machine Translation and Hybrid Machine Translation. We will also share pointers for successful implementation of any of these technologies.
Evaluation of Domain Adaptation Techniques for TRANSLI in a Real-World Environment
Statistical Machine Translation (SMT) systems specialized for one domain often perform poorly when applied to other domains. Domain adaptation techniques allow SMT models trained from a source domain with abundant data to accommodate different target domains with limited data. This paper evaluates the performance of two adaptive techniques based on log-linear and mixture models on data from the legal domain in real-world settings. Performance evaluation includes post-editing time and effort required by a professional post-editor to improve the quality of machine-generated translations to meet industry standards, as well as traditional automated scoring techniques (BLEU scores). Results indicates that the domain adaptation techniques can yield a significant increase in BLEU score (up to three points) and a significant reduction in post-editing time of about one second per word in an operational environment.
An LSP Perspective: Business & Process Challenges Implementing MT Solutions: Is MT Delivering Expected Value?
Machine translation resurfaced as a viable business solution about 5 years ago, with much hype. With the amount of content requiring translation, and a mellowing of user expectations about translation quality, it seemed there was real business value in developing machine translation solutions. Since then, however, the discounts offered to enterprise customers have remained stubbornly meager in the 10-20% range, with high, up-front costs—far from the anticipated savings. This paper provides an overview of the challenges encountered in the value chain between customer and Language Service Provider (LSP) which keep translation costs high and limit machine translation adoption, discusses existing and potential solutions to these challenges, and offers suggestions on how to enlist the support of the LSP and freelance translator community to address these challenges.
Translating User-Generated Content in the Social Networking Space
This paper presents a case-study of work done by Applied Language Solutions (ALS) for a large social networking provider who claim to have built the world’s first multi-language social network, where Internet users from all over the world can communicate in languages that are available in the system. In an initial phase, the social networking provider contracted ALS to build Machine Translation (MT) engines for twelve language-pairs: Russian⇔English, Russian⇔Turkish, Russian⇔Arabic, Turkish⇔English, Turkish⇔Arabic and Arabic⇔English. All of the input data is user-generated content, so we faced a number of problems in building large-scale, robust, high-quality engines. Primarily, much of the source-language data is of ‘poor’ or at least ‘non-standard’ quality. This comes in many forms: (i) content produced by non-native speakers, (ii) content produced by native speakers containing non-deliberate typos, or (iii) content produced by native speakers which deliberately departs from spelling norms to bring about some linguistic effect. Accordingly, in addition to the ‘regular’ pre-processing techniques used in the building of our statistical MT systems, we needed to develop routines to deal with all these scenarios. In this paper, we describe how we handle shortforms, acronyms, typos, punctuation errors, non-dictionary slang, wordplay, censor avoidance and emoticons. We demonstrate automatic evaluation scores on the social network data, together with insights from the the social networking provider regarding some of the typical errors made by the MT engines, and how we managed to correct these in the engines.
Managing Change when Implementing MT Systems
Managing large scale MT post-editing projects is a challenging endeavor. From securing linguists buy-in to ensuring consistency of the output, it is important to develop a set of specific processes and tools that facilitate this task. Drawing from years of experience in such projects, we will attempt here to describe the challenges associated to the management of such projects and to define best practices.
Beyond MT: Source Content Quality and Process Automation
Patricia Paladini Adell
This document introduces the strategy implemented at CA Technologies to exploit Machine Translation (MT) at the corporate-wide level. We will introduce the different approaches followed to further improve the quality of the output of the machine translation engine once the engines have reached a maximum level of customization. Senior team support, clear communication between the parties involved and improvement measurement are the key components for the success of the initiative.
Incremental Re-Training of a Hybrid English-French MT System with Customer Translation Memory Data
In this paper, we present SAIC’s hybrid machine translation (MT) system and show how it was adapted to the needs of our customer – a major global fashion company. The adaptation was performed in two ways: off-line selection of domain-relevant parallel and monolingual data from a background database, as well as on-line incremental adaptation with customer parallel and translation memory data. The translation memory was integrated into the statistical search using two novel features. We show that these features can be used to produce nearly perfect translations of data that fully or to a large extent partially matches the TM entries, without sacrificing on the translation quality of the data without TM matches. We also describe how the human post-editing effort was reduced due to significantly better MT quality after adaptation, but also due to improved formatting and readability of the MT output.
Multiplying the Potential of Crowdsourcing with Machine Translation
Patricia Paladini Adell
Machine Translation (MT) is said to be the next lingua franca. With the evolution of new technologies and the capacity to produce a humungous number of written digital documents, human translators will not be able to translate documentation fast enough. However, some applications require a level of quality that is still beyond that provided by MT. Thanks to the increased capacity of communication provided by new technologies, people can also interact and collaborate to work remotely. With this, crowd computing is becoming more common and it has been proposed as a feasible solution for translation. In this paper, we discuss about the relationship between crowdsourcing and MT, and the main challenges for the MT community to multiply the potential of the crowd.
Machine Translation as a Global Enterprise Service at Ford
Ford Motor Company is at the forefront of the global economy and with this comes the need for communicating with regional manufacturing staff and plant employees in their own languages. Asian employees, in particular, do not necessarily learn English as a second language as is often the case in European countries, so manufacturing systems are now mandated to support local languages. This support is required for plant floor system applications where static data (labels, menus, and messages) as well as dynamic data (user entered controlled and free text) is required to be translated from/to English and the local languages. This facilitates commonization of business methods where best practices can be shared globally between plant and staff members. In this paper and presentation, we will describe our experiences in bringing Machine Translation technology to a large multinational corporation such as Ford and discuss the lessons we learned as well as both the successes and failures we have experienced.
Using the Microsoft Translator Hub at The Church of Jesus Christ of Latter-day Saints
Stephen D. Richardson
The Church of Jesus Christ of Latter-day Saints undertook an extensive effort at the beginning of this year to deploy machine translation (MT) in the translation workflow for the content on its principal website, www.lds.org. The objective of this effort is to reduce by at least 50% the time required by human translators to translate English content into nine other languages and publish it on this site. This paper documents the experience to date, including selection of the MT system, preparation and use of data to customize the system, initial deployment of the system in the Church’s translation workflow, post-editing training for translators, the resulting productivity improvements, and plans for future deployments.
Automatic Speech Recognition & Hybrid MT for HQ Closed-Captioning & Subtitling for Video Broadcast
We describe a system to rapidly generate high-quality closed captions and subtitles for live broadcasted TV shows, using automated components, namely Automatic Speech Recognition and Machine Translation. The human stays in the loop for quality assurance and optional post-editing. We also describe how the system feeds the human edits and corrections back into the different components for improvement of these components and with that of the overall system. We finally describe the operation of this system in a real life environment within a broadcast network, where we implemented the system to transcribe, process broadcast transmissions and generate high-quality closed captions in Arabic and translate these into English subtitles in short time.
Spoken Language Translation: Three Business Opportunities
This paper reports on three business opportunities encountered by Spoken Translation, Inc., a developer of software systems for automatic spoken translation: (1) a healthcare organization needing improved communications between limited-English patients and their caregivers; (2) a networking and communications firm aiming to add UN-style simultaneous interpreting to their telepresence facilities; and (3) the retail arm of a device manufacturer hoping to enable more effective in-store consulting for customers with imperfect command of an outlet's native language. None of these openings has yet led to substantial business, but one remains in negotiation. We describe how the business introductions came to us; the proposed use cases; demonstrations, presentations, tests, etc.; and issues/challenges. We also comment on early consumer-oriented products for spoken language translation. The aim is to provide a snapshot of one company's business possibilities and challenges at the dawn of the era of automatic interpreting.
IPTranslator: Facilitating Patent Search with Machine Translation
Joeri Van de Walle
Intellectual Property professionals frequently need to carry out patent searches for a variety of reasons. During a typical search, they will retrieve approximately 30% of their results in a foreign language. The machine translation (MT) options currently available to patent searchers for these foreign-language patents vary in their quality, consistency, and general level of service. In this article, we introduce IPTranslator; an MT web service designed to cater for the needs of patent searchers. At the core of IPTranslator is a set of MT systems developed specifically for translating patent text. We describe the challenges faced in adapting MT technology to such a complex domain, and how the systems were evaluated to ensure that the quality was fit for purpose. Finally, we present the framework through which the IPTranslator service is delivered to users, and the value-adding features which address many of the issues with existing solutions.
U.S. Army Machine Foreign Language Translation System (MFLTS) Capability Update and Review
Michael J. Beaulieu
As early as June 2003, the United States Army partnered with United States Joint Forces Command to review language requirements within the Army, and, to a lesser extent, the other United States Military Services. After review of missions that require language translation, in 2005 the Army completed an Analysis of Alternatives document, which served as an independent assessment of potential language translation alternatives: options and numerical assessments based on each option’s ability to address language translation requirements. Of the four identified alternatives (printed materials, government off the shelf, commercial off the shelf, and overarching program), incremental development of two-way speech and text translation software modules proved to be the most mission and cost effective. That same year, United States Department of Defense published the Defense Language Transformation Roadmap listing a requirement for a coherent, prioritized, and coordinated multi-language technology research, development and acquisition policy and program. Since 2005, the Army and the Joint Staff have validated requirements for machine foreign language translation capability. In the effort to develop a comprehensive machine foreign translation capability, the Army not only needs to enable software to handle one of the most complex systems that humans deal with, but we need to develop the architecture and processes to routinely produce and maintain this capability. The Army has made the initial effort, funding a machine foreign language translation program known as the Machine Foreign Language Translation System (MFLTS) Program. It is intended to be the overarching Army Program with Department of Defense interest to provide machine foreign language translation capabilities that meet language translation gaps. MFLTS will provide a basic communications and triage capability for speech and text translations and improve those capabilities as the technology advances. Capabilities are intended to be delivered through three configurations: over established networks (web based), in mobile (or desktop) configurations and on portable platforms (or man wearable microprocessors and/or handhelds). MFLTS software, as a mission enabler ported on other platforms and systems, will provide Joint, Allied/Coalition units and personnel with language translation capability within the full range of military operations. Most recently, the Army convened a Machine Foreign Language Translation System (MFLTS) General Office Steering Group (GOSG) in March 2012 and validated follow-on language, domain and technology required capabilities for the Army MFLTS Program beyond the initial capability scheduled for 2014.
Panel Discussion Topic: Return on Investment for Human Language Technology in the U.S. Government
Government agencies are investing in MT to boost production, but the future funding picture is uncertain. Decision makers (Congress, OMB, IC leadership) want evidence (quantitative/qualitative) of value for investments. Agencies can use positive ROIs to defend MT investment budgets, plans, and programs, but the information needs to be more than anecdotal.
Language and Translation Challenges in Social Media
The explosive growth of social media has led to a wide range of new challenges for machine translation and language processing. The language used in social media occupies a new space between structured and unstructured media, formal and informal language, and dialect and standard usage. Yet these new platforms have given a digital voice to millions of user on the Internet, giving them the opportunity to communicate on the first truly global stage – the Internet. Social media covers a broad category of communications formats, ranging from threaded conversations on Facebook, to microblog and short message content on platforms like Twitter and Weibo – but it also includes user-generated comments on YouTube, as well as the contents of the video itself, and even includes ‘traditional’ blogs and forums. The common thread linking all of these is that the media is generated by, and is targeted at individuals. This talk will survey some of the most popular social media platforms, and identify key challenges in translating the content found in them – including dialect, code switching, mixed encodings, the use of “internet speak”, and platform-specific language phenomena, as well as volume and genre. In addition, we will talk about some of the challenges in analyzing social media from an operational point of view, and how language and translation issues influence higher-level analytic processes such as entity extraction, topic classification and clustering, geo-spatial analysis and other technologies that enable comprehension of social media. These latter capabilities are being adapted for social media analytics for US Government analysts under the support of the Technical Support Working Group at the US DoD, enabling translingual comprehension of this style of content in an operational environment.
Producing Data for Under-Resourced Languages: A Dari-English Parallel Corpus of Multi-Genre Text
In Developers producing language technology for under-resourced languages often find relatively little machine readable text for data required to train machine translation systems. Typically, the kinds of text that are most accessible for production of parallel data are news and news-related genres, yet the language that requires translation for analysts and decision-makers reflects a broad range of forms and contents. The proposed paper will describe an effort funded by the ODNI FLPO in which the Army Research Laboratory, assisted by MITRE language technology researchers, produced a Dari-English parallel corpus containing text in a variety of styles and genres that more closely resemble the kinds of documents needed by government users than do traditional news genres. The data production effort began with a survey of Dari documents catalogued in a government repository of material obtained from the field in Afghanistan. Because the documents in the repository are not available for creation of parallel corpora, the goal was to quantify the types of documents in the collection and identify their linguistic features in order to find documents that are similar. Document images were obtained from two sources: (1) the Preserving and Creating Access to Unique Afghan Records collection, an online resource produced by the University of Arizona Libraries and the Afghanistan Centre at Kabul University and (2) The University of Nebraska Arthur Paul Afghanistan Collection. For the latter, document images were obtained by camera capture of books and by selecting pdf images of microfiche records. A set of 1395 document page images was selected to provide 250,000 translated English words in 10 content domains. The images were transcribed and translated according to specifications designed to maximize the quality and usefulness of the data. The corpus will be used to create a Dari-English glossary, and an experiment will quantify improvements to Dari-English translation of multi-genre text when a generic Dari-English machine translation system is customized using the corpus. The proposed paper will present highlights from these efforts.
Machine Translation Revisited: An Operational Reality Check
The government and the research community have strived for the past few decades to develop machine translation capabilities. Historically, DARPA took the lead in the grand challenge aiming at surpassing human translation quality. While we have made strides from rule based, to statistical and hybrid machine translation engines, we cannot rely solely on machine translation to overcome the language barrier and accomplish the mission. Machine Translation is often misunderstood or misplaced in the operational settings as expectations are unrealistic and optimization not achieved. With the increase in volume, variety and velocity of data, new paradigms are needed when choosing machine translation software and embedding it into a business process so as to achieve the operational goals. The talk will focus on the operational requirements and frame where, when and how to use machine translation. We will also outline some gaps and suggest new areas for research, development, and implementation.
Sharpening the Claws on CAT Tools: Increase Quality & Production, Maximize Limited Resources
Making the right connections hinges on linking data from disparate sources. Frequently the link may be a person or place, so something as simple as a mistranslated name will cause a search to miss relevant documents. To swiftly and accurately exploit a growing flood of foreign language information acquired for the defense of the nation, Intelligence Community (IC) linguists and analysts need assistance in both translation accuracy and productivity. The name translation and standardizing component of a Computer-Aided Translation (CAT) tool such as the Highlight language analysis suite ensures fast and reliable translation of names from Arabic, Dari, Farsi, and Pashto according to a number of government transliteration standards. Highlight improves efficiency and maximizes the utilization of scarce human resources.
Strategies in Developing Engine-specific Chinese-English User Parallel Corpora
This paper proposes some strategies and techniques for creating phrase-level user parallel corpora for Systran translation engine. Though not all strategies and techniques discussed here will apply to other translation engines, the concept will.
Government Catalog of Language Resources (GCLR)
The purpose of this presentation is to discuss recent efforts within the government to address issues of evaluation and return on investment. Pressure to demonstrate value has increased with the growing amount of foreign language information available, with the variety of languages needing to be exploited, and with the increasing gaps between numbers of language-enabled people and the amount of work to be done. This pressure is only growing as budgets shrink, and as global development grows. Over the past year, the ODNI has led an effort to pull together different government stakeholders to determine some baseline standards for determining Return on Investment via task-based evaluation. Stakeholder consensus on major HLT tasks has involved examination of the different approaches to determining return on investment and how it relates use of HLT in the workflow. In addition to reporting on the goals and progress of this group, we will present future directions and invite community input.
A New Method for Automatic Translation Scoring-HyTER
It is common knowledge that translation is an ambiguous, 1-to-n mapping process, but to date, our community has produced no empirical estimates of this ambiguity. We have developed an annotation tool that enables us to create representations that compactly encode an exponential number of correct translations for a sentence. Our findings show that naturally occurring sentences have billions of translations. Having access to such large sets of meaning-equivalent translations enables us to develop a new metric, HyTER, for translation accuracy. We show that our metric provides better estimates of machine and human translation accuracy than alternative evaluation metrics using data from the most recent Open MT NIST evaluation and we discuss how HyTER representations can be used to inform a data-driven inquiry into natural language semantics.
Return on Investment for Government Human Language Technology Systems
Over the years, the government has translated reams of material, transcribed decades of audio, and processed years of text. Where is that material now? How valuable would it be to have that material available to push research and applications and to support foreign language training? Over 20 years ago, DARPA funded the Linguistic Data Consortium (LDC) at the University of Pennsylvania to collect, catalog, store and provide access to language resources. Since that time, the LDC has collected thousands of corpora in many different genres and languages. Although the government has access to the full range of LDC data through a community license, until recently corpora specific to government needs were usually deleted soon after they were created. In order to address the need for a government-only catalog and repository, the Government Catalog of Language Resources was funded through the ODNI, and an initial prototype has been built. The GCLR will be transferred to a government executive agent who will be responsible for making improvements, adding corpora, and maintaining and sustaining the effort. The purpose of this talk is to present the model behind GCLR, to demonstrate its purpose, and to invite attendees to contribute and use contents. Background leading up to the current version will be presented. Use cases of parallel corpora in teaching, technology development and language maintenance will also be covered. Learning from the LDC on how corpora are used, and linking with the LDC will be part of future directions to enable government applications to utilize these resources.
Evaluating Parallel Corpora: Assessing Utility in Government Translation Memory (TM) Systems
Translation memory (TM) software allows a user to leverage previously translated material in the form of parallel corpora to improve the quality, efficiency, and consistency of future translation work. Within the intelligence community (IC), one of the major bottlenecks in implementing TM systems is developing a relevant parallel corpus. In particular, the IC needs to explore methods of deploying open source corpora for use with TM systems in a classified setting. To address this issue we are devising automated metrics for comparing various corpora in order to predict their usefulness to serve as vaults for particular translation needs. The proposed methodology will guide the use of these corpora, as well as the selection and optimization of novel corpora. One of the critical factors in TM vault creation is optimizing the trade-off between vault size and domain-specificity. Although a larger corpus may be more likely to contain material that matches words or phrases in the material to be translated, there is a danger that some of the proposed matches may include translations that are inappropriate for a given context. If the material in the vault and the material to be translated cover similar domains, the matches provided by the vault may be more likely to occur in the appropriate context. To explore this trade-off we are developing and implementing computational similarity metrics (e.g., n-gram overlap, TF-IDF) for comparison of corpora covering 12 different domains. We are also examining summary statistics produced by TM systems to test the degree to which material from each domain serves as a useful vault for translating material from each of the other domains, as well as the degree to which vault size improves the number and quality of proposed matches. The results of this research will help translation managers and other users assess the utility of a given parallel corpus for their particular translation needs, and may ultimately lead to improved tagging within TM systems to help translators identify the most relevant matches. Use of open source materials allows tool developers and users to leverage existing corpora, thus holding the promise of driving down costs of vault creation and selection. Optimizing vaults also promises to improve the quality, efficiency, and consistency of translation processes and products.
The Conundrum of New Online Languages: Translating Arabic Chat
Online communications are playing an unprecedented role in propelling the revolutionary changes that are sweeping throughout the Middle East. A significant portion of that communication is in Romanized Arabic chat (Arabizi), which uses a combination of numerals and Roman characters, as well as non-Arabic words, to write Arabic in place of conventional Arabic script. Language purists in the Arabic-speaking world are lamenting that the use of Arabizi is becoming so profound that it is “destroying the Arabic language.” Despite its widespread use, and significant effect on emerging societies, Government agencies and others have been unable to extract any useful data from Arabizi because of its unconventional characteristics. Therefore, they have had to rely on human, computer-savvy translators, who often are a burden on dwindling resources, and are easily overwhelmed by the sheer volume of incoming data. Our presentation will explore the challenges of triaging and analyzing the Romanized Arabic format and describe Basis Technology’s Arabic chat translation software. This system will convert, for instance, mo2amrat, mo2amaraat, or mou’amret to مؤامرات. The output of standard Arabic can then be exploited for relevant information with a full set of other tools that will index/search, carry out linguistic analyses, extract entities, translate/transliterate names, and machine translate from the Arabic into English or other languages. Because of the nature of Arabizi – writers are able to express themselves in their native Arabic dialects, something that is not so easily done with Modern Standard Arabic – there is a bonus feature in that now we are also able to identify the probable geographical origins of each writer, something that is of great intelligence value. Looking at real-world scenarios, we will discuss how the chat translator can be built into solutions for users to overcome technological, linguistic, and cultural obstacles to achieve operational success and complete tasks.
Reversing the Palladius Mapping of Chinese Names in Russian Text
We present the Reverse Palladius (RevP) program developed by the Air Force Research Laboratory's Speech and Communication Research, Engineering, Analysis, and Modeling (SCREAM) Laboratory for the National Air and Space Intelligence Center (NASIC). The RevP program assists the linguist in correcting the transliteration of Mandarin Chinese names during the Russian to English translation process. Chinese names cause problems for transliteration, because Russian writers follow a specific Palladius mapping for Chinese sounds. Typical machine translation of Russian into English then applies standard transliteration of the Russian sounds in these names, producing errors that require hand-correction. For example, the Chinese name Zhai Zhigang is written in Cyrillic as Чжай Чжиган, and standard transliteration via Systran renders this into English as Chzhay Chzhigan. In contrast, the RevP program uses rules that reverse the Palladius mapping, yielding the correct form Zhai Zhigang. When using the RevP program, the linguist opens a Russian document and selects a Chinese name for transliteration. The rule-based algorithm proposes a reverse Palladius transliteration, as well as a stemmed option if the word terminates in a possible Russian inflection. The linguist confirms the appropriate version of the name, and the program both corrects the current instance and stores the information for future use. The resulting list of name mappings can be used to pre-translate names in new documents, either via stand-alone operation of the RevP program, or through compilation of the list as a Systran user dictionary. The RevP program saves time by removing the need for post-editing of Chinese names, and improves consistency in the translation of these names. The user dictionary becomes more useful over time, further reducing the time required for translation of new documents.
Introduction to Machine Translation
This tutorial is for people who are beginning to evaluate how well machine translation will fit their needs or who are curious to know more about how it is used. We assume no previous knowledge of machine translation. We focus on background knowledge that will help you both get more out of the rest of AMTA2010 and to make better decisions about how to invest in machine translation. Past participants have ranged from tech writers and freelance translators who want to keep up to date to VPs and CEOs who are evaluating technology strategies for their organizations. The main topics for discussion are common FAQs about MT (Can machines really translate? Can we fire our translators now?) and limitations (Why is the output so bad? What is MT good for?), workflow (Why buy MT if it’s free on the internet? What other kinds of translation automation are there? How do we use it?), return on investment (How much does MT cost? How can we convince our bosses to buy MT?), and steps to deployment (Which MT system should we buy? What do we do next?).
Increasing Localization Efficiency with SYSTRAN Hybrid MT Products
John Paul Barraza
This session will cover how to increase localization efficiency with a SYSTRAN desktop product and a server solution. First we will demonstrate how to integrate MT in a localization workflow, interaction with TM matching tools, hands-on MT customization using various tools and dictionaries, and final post-edition using SYSTRAN Premium Translator, a desktop product. We will also walk through the complete cycle of automatic quality improvement using SYSTRAN Training Server, part of the Enterprise Server 7 suite. It covers managing bilingual and monolingual data using Corpus Manager, training hybrid or statistical translation models with Training Manager, and evaluating quality using automatic scoring and side-by-side translation comparison. It also includes other useful tools that automatically extract and validate dictionary entries, and create TMs from unaligned bilingual sentences automatically. Finally, localization efficiency with or without MT integration/customization is compared with the actual cost benefits.
MT and Arabic Language Issues
Arabic poses many interesting challenges to machine translation: ambiguous orthography, rich morphology, complex morpho-syntactic behavior, and numerous dialects. In this tutorial, we introduce the most important themes of challenges and solutions for people working on translation from/to Arabic or any of its dialects. The tutorial is intended for researchers and developers working on MT. The discussion of linguistic issues and how they are addressed in MT will help linguists and professional translators understand the issues machine translation faces when dealing with Arabic and other morphologically rich languages. The tutorial does not expect the attendees to be able to speak/read/write Arabic.
Using MT in Today’s CAT Tools
This tutorial will present a survey of how machine translation is integrated into current CAT tools and illustrate how the technology can be used appropriately and profitably by the professional translator.
Open Source Statistical Machine Translation
If you are interested in open-source machine translation but lack hands-on experience, this is the tutorial for you! We will start with background knowledge of statistical machine translation and then walk you through the process of installing and running an SMT system. We will show you how to prepare input data, and the most efficient way to train and use your translation systems. We shall also discuss solutions to some of the most common issues that face LSPs when using SMT, including how to tailor systems to specific clients, preserving document layout and formatting, and efficient ways of incorporating new translation memories. Previous years’ participants have included software engineers and managers who need to have a detailed understanding of the SMT process. This is a fast-paced, hands-on tutorial that will cover the skills you need to get you up and running with open-source SMT. The teaching will be based on the Moses toolkit, the most popular open-source machine translation software currently available. No prior knowledge of MT is necessary, only an interest in it. A laptop is required for this tutorial, and you should have rudimentary knowledge of using the command line on Windows or Linux.
Practical Domain Adaptation in SMT
Several studies have recently reported significant productivity gains by human translators when besides translation memory (TM) matches they do also receive suggestions from a statistical machine translation (SMT) engine. In fact, an increasing number of language service providers and in-house translation services of large companies is nowadays integrating SMT in their workflow. The technology transfer of state-of-the-art SMT technology from research to industry has been relatively fast and simple also thanks to development of open source software, such as MOSES, GIZA++, and IRSTLM. While a translator is working on a specific translation project, she evaluates the utility of translating versus post-editing a segment based on the adequacy and fluency provided by the SMT engine, which in turn depends on the considered language pair, linguistic domain of the task, and the amount of available training data. Statistical models, like those employed in SMT, rely on a simple assumption: data used to train and tune the models represent the target translation task. Unfortunately, this assumption cannot be satisfied for most of the real application cases, simply because for most of the language pairs and domains there is no sufficient data to adequately train an SMT system. Hence, common practice is to train SMT systems by merging together parallel and monolingual data from the target domain with as much as possible data from any other available source. This workaround is simple and gives practical benefits but is often not the best way to exploit the available data. This tutorial copes with the optimal use of in-domain and out-of-domain data to achieve better SMT performance on a given application domain. Domain adaptation, in general, refers to statistical modeling and machine learning techniques that try to cope with the unavoidable mismatch between training and task data that typically occurs in real life applications. Our tutorial will survey several application cases in which domain adaptation can be applied, and presents adaptation techniques that best fit each case. In particular, we will cover adaptation methods for n-gram language models and translation models in phrase-based SMT. The tutorial will provide some high-level theoretical background in domain adaptation, it will discuss practical application cases, and finally show how the presented methods can be applied with two widely used software tools: Moses and IRSTLM. The tutorial is suited for any practitioner of statistical machine translation. No particular programming or mathematical background is required.
Workshop on Post-Editing Technology and Practice
The CRITT TPR-DB 1.0: A Database for Empirical Human Translation Process Research
This paper introduces a publicly available database of recorded translation sessions for Translation Process Research (TPR). User activity data (UAD) of translators behavior was collected over the past 5 years in several translation studies with Translog 1 , a data acquisition software which logs keystrokes and gaze data during text reception and production. The database compiles this data into a consistent format which can be processed by various visualization and analysis tools.
Post-editing time as a measure of cognitive effort
Post-editing machine translations has been attracting increasing attention both as a common practice within the translation industry and as a way to evaluate Machine Translation (MT) quality via edit distance metrics between the MT and its post-edited version. Commonly used metrics such as HTER are limited in that they cannot fully capture the effort required for post-editing. Particularly, the cognitive effort required may vary for different types of errors and may also depend on the context. We suggest post-editing time as a way to assess some of the cognitive effort involved in post-editing. This paper presents two experiments investigating the connection between post-editing time and cognitive effort. First, we examine whether sentences with long and short post-editing times involve edits of different levels of difficulty. Second, we study the variability in post-editing time and other statistics among editors.
Average Pause Ratio as an Indicator of Cognitive Effort in Post-Editing: A Case Study
Gregory M. Shreve
Pauses are known to be good indicators of cognitive demand in monolingual language production and in translation. However, a previous effort by O’Brien (2006) to establish an analogous relationship in post-editing did not produce the expected result. In this case study, we introduce a metric for pause activity, the average pause ratio, which is sensitive to both the number and duration of pauses. We measured cognitive effort in a segment by counting the number of complete editing events. We found that the average pause ratio was higher for less cognitively demanding segments than for more cognitively demanding segments. Moreover, this effect became more pronounced as the minimum threshold for pause length was shortened.
Reliably Assessing the Quality of Post-edited Translation Based on Formalized Structured Translation Specifications
Alan K. Melby
Paul J. Fields
Post-editing of machine translation has become more common in recent years. This has created the need for a formal method of assessing the performance of post-editors in terms of whether they are able to produce post-edited target texts that follow project specifications. This paper proposes the use of formalized structured translation specifications (FSTS) as a basis for post-editor assessment. To determine if potential evaluators are able to reliably assess the quality of post-edited translations, an experiment used texts representing the work of five fictional post-editors. Two software applications were developed to facilitate the assessment: the Ruqual Specifications Writer, which aids in establishing post-editing project specifications; and Ruqual Rubric Viewer, which provides a graphical user interface for constructing a rubric in a machine-readable format. Seventeen non-experts rated the translation quality of each simulated post-edited text. Intraclass correlation analysis showed evidence that the evaluators were highly reliable in evaluating the performance of the post-editors. Thus, we assert that using FSTS specifications applied through the Ruqual software tools provides a useful basis for evaluating the quality of post-edited texts.
Learning to Automatically Post-Edit Dropped Words in MT
Automatic post-editors (APEs) can improve adequacy of MT output by detecting and reinserting dropped content words, but the location where these words are inserted is critical. In this paper, we describe a probabilistic approach for learning reinsertion rules for specific languages and MT systems, as well as a method for synthesizing training data from reference translations. We test the insertion logic on MT systems for Chinese to English and Arabic to English. Our adaptive APE is able to insert within 3 words of the best location 73% of the time (32% in the exact location) in Arabic-English MT output, and 67% of the time in Chinese-English output (30% in the exact location), and delivers improved performance on automated adequacy metrics over a previous rule-based approach to insertion. We consider how particular aspects of the insertion problem make it particularly amenable to machine learning solutions.
SmartMATE: An Online End-To-End MT Post-Editing Framework
It is a well-known fact that the amount of content which is available to be translated and localized far outnumbers the current amount of translation resources. Automation in general and Machine Translation (MT) in particular are one of the key technologies which can help improve this situation. However, a tool that integrates all of the components needed for the localization process is still missing, and MT is still out of reach for most localisation professionals. In this paper we present an online translation environment which empowers users with MT by enabling engines to be created from their data, without a need for technical knowledge or special hardware requirements and at low cost. Documents in a variety of formats can then be post-edited after being processed with their Translation Memories, MT engines and glossaries. We give an overview of the tool and present a case study of a project for a large games company, showing the applicability of our tool.
To post-edit or not to post-edit? Estimating the benefits of MT post-editing for a European organization
In the last few years the European Parliament has witnessed a significant increase in translation demand. Although Translation Memory (TM) tools, terminology databases and bilingual concordancers have provided significant leverage in terms of quality and productivity the European Parliament is in need for advanced language technology to keep facing successfully the challenge of multilingualism. This paper describes an ongoing large-scale machine translation post-editing evaluation campaign the purpose of which is to estimate the business benefits from the use of machine translation for the European Parliament. This paper focuses mainly on the design, the methodology and the tools used by the evaluators but it also presents some preliminary results for the following language pairs: Polish-English, Danish-English, Lithuanian-English, English-German and English-French.
How Good Is Crowd Post-Editing? Its Potential and Limitations
This paper is a partial report of a research effort on evaluating the effect of crowd-sourced post-editing. We first discuss the emerging trend of crowd-sourced post-editing of machine translation output, along with its benefits and drawbacks. Second, we describe the pilot study we have conducted on a platform that facilitates crowd-sourced post-editing. Finally, we provide our plans for further studies to have more insight on how effective crowd-sourced post-editing is.
Error Detection for Post-editing Rule-based Machine Translation
The increasing role of post-editing as a way of improving machine translation output and a faster alternative to translating from scratch has lately attracted researchers’ attention and various attempts have been proposed to facilitate the task. We experiment with a method to provide support for the post-editing task through error detection. A deep linguistic error analysis was done of a sample of English sentences translated from Portuguese by two Rule-based Machine Translation systems. We designed a set of rules to deal with various systematic translation errors and implemented a subset of these rules covering the errors of tense and number. The evaluation of these rules showed a satisfactory performance. In addition, we performed an experiment with human translators which confirmed that highlighting translation errors during the post-editing can help the translators perform the post-editing task up to 12 seconds per error faster and improve their efficiency by minimizing the number of missed errors.
Machine Translation Infrastructure and Post-editing Performance at Autodesk
In this paper, we present the Moses-based infrastructure we developed and use as a productivity tool for the localisation of software documentation and user interface (UI) strings at Autodesk into twelve languages. We describe the adjustments we have made to the machine translation (MT) training workflow to suit our needs and environment, our server environment and the MT Info Service that handles all translation requests and allows the integration of MT in our various localisation systems. We also present the results of our latest post-editing productivity test, where we measured the productivity gain for translators post-editing MT output versus translating from scratch. Our analysis of the data indicates the presence of a strong correlation between the amount of editing applied to the raw MT output by the translators and their productivity gain. In addition, within the last calendar year our system has processed over thirteen million tokens of documentation content of which we have a record of the performed post-editing. This has allowed us to evaluate the performance of our MT engines for the different languages across our product portfolio, as well as spotlight potential issues with MT in the localisation process.
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages
Translating English Discourse Connectives into Arabic: a Corpus-based Analysis and an Evaluation Metric
Discourse connectives can often signal multiple discourse relations, depending on their context. The automatic identification of the Arabic translations of seven English discourse connectives shows how these connectives are differently translated depending on their actual senses. Automatic labelling of English source connectives can help a machine translation system to translate them more correctly. The corpus-based analysis of Arabic translations also enables the definition of a connective-specific evaluation metric for machine translation, which is here validated by human judges on sample English/Arabic translation data.
Idiomatic MWEs and Machine Translation A Retrieval and Representation Model: the AraMWE Project
A preliminary implementation of AraMWE, a hybrid project that includes a statistical component and a CCG symbolic component to extract and treat MWEs and idioms in Arabic and Eng- lish parallel texts is presented, together with a general sketch of the system, a thorough description of the statistical component and a proof of concept of the CCG component.
Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus
Seyyed Mohammad Mohammadzadeh Ziabary
The translation quality of Statistical Machine Translation (SMT) depends on the amount of input data especially for morphologically rich languages. Farsi (Persian) language is such a language which has few NLP resources. It also suffers from the non-standard written characters which causes a large variety in the written form of each character. Moreover, the structural difference between Farsi and English results in long range reorderings which cannot be modeled by common SMT reordering models. Here, we try to improve the existing English-Farsi SMT system focusing on these challenges first by expanding our bilingual limited-domain corpus to an open-domain one. Then, to alleviate the character variations, a new text normalization algorithm is offered. Finally, some hand-crafted rules are applied to reduce the structural differences. Using the new corpus, the experimental results showed 8.82% BLEU improvement by applying new normalization method and 9.1% BLEU when rules are used.
ARNE - A tool for Namend Entity Recognition from Arabic Text
In this paper, we study the problem of finding named entities in the Arabic text. For this task we present the development of our pipeline software for Arabic named entity recognition (ARNE), which includes tokenization, morphological analysis, Buckwalter transliteration, part of speech tagging and named entity recognition of person, location and organisation named entities. In our first attempt to recognize named entites, we have used a simple, fast and language independent gazetteer lookup approach. In our second attempt, we have used the morphological analysis provided by our pipeline to remove affixes and observed hence an improvement in our performance. The pipeline presented in this paper, can be used in future as a basis for a named entity recognition system that recognized named entites not only using gazetteers, but also making use of morphological information and part of speech tagging.
Approaches to Arabic Name Transliteration and Matching in the DataFlux Quality Knowledge Base
Brant N. Kay
Brian C. Rineer
This paper discusses a hybrid approach to transliterating and matching Arabic names, as implemented in the DataFlux Quality Knowledge Base (QKB), a knowledge base used by data management software systems from SAS Institute, Inc. The approach to transliteration relies on a lexicon of names with their corresponding transliterations as its primary method, and falls back on PERL regular expression rules to transliterate any names that do not exist in the lexicon. Transliteration in the QKB is bi-directional; the technology transliterates Arabic names written in the Arabic script to the Latin script, and transliterates Arabic names written in the Latin script to Arabic. Arabic name matching takes a similar approach and relies on a lexicon of Arabic names and their corresponding transliterations, falling back on phonetic transliteration rules to transliterate names into the Latin script. All names are ultimately rendered in the Latin script before matching takes place. Thus, the technology is capable of matching names across the Arabic and Latin scripts, as well as within the Arabic script or within the Latin script. The goal of the authors of this paper was to build a software system capable of transliterating and matching Arabic names across scripts with an accuracy deemed to be acceptable according to internal software quality standards.
Using Arabic Transliteration to Improve Word Alignment from French- Arabic Parallel Corpora
In this paper, we focus on the use of Arabic transliteration to improve the results of a linguistics-based word alignment approach from parallel text corpora. This approach uses, on the one hand, a bilingual lexicon, named entities, cognates and grammatical tags to align single words, and on the other hand, syntactic dependency relations to align compound words. We have evaluated the word aligner integrating Arabic transliteration using two methods: A manual evaluation of the alignment quality and an evaluation of the impact of this alignment on the translation quality by using the Moses statistical machine translation system. The obtained results show that Arabic transliteration improves the quality of both alignment and translation.
Preprocessing Egyptian Dialect Tweets for Sentiment Mining
Research done on Arabic sentiment analysis is considered very limited almost in its early steps compared to other languages like English whether at document-level or sentence-level. In this paper, we test the effect of preprocessing (normalization, stemming, and stop words removal) on the performance of an Arabic sentiment analysis system using Arabic tweets from twitter. The sentiment (positive or negative) of the crawled tweets is analyzed to interpret the attitude of the public with regards to topic of interest. Using Twitter as the main source of data reflects the importance of the system for the Middle East region, which mostly speaks Arabic.
Rescoring N-Best Hypotheses for Arabic Speech Recognition: A Syntax- Mining Approach
Improving speech recognition accuracy through linguistic knowledge is a major research area in automatic speech recognition systems. In this paper, we present a syntax-mining approach to rescore N-Best hypotheses for Arabic speech recognition systems. The method depends on a machine learning tool (WEKA-3-6-5) to extract the N-Best syntactic rules of the Baseline tagged transcription corpus which was tagged using Stanford Arabic tagger. The proposed method was tested using the Baseline system that contains a pronunciation dictionary of 17,236 vocabularies (28,682 words and variants) from 7.57 hours pronunciation corpus of modern standard Arabic (MSA) broadcast news. Using Carnegie Mellon University (CMU) PocketSphinx speech recognition engine, the Baseline system achieved a Word Error Rate (WER) of 16.04 % on a test set of 400 utterances ( about 0.57 hours) containing 3585 diacritized words. Even though there were enhancements in some tested files, we found that this method does not lead to significant enhancement (for Arabic). Based on this research work, we conclude this paper by introducing a new design for language models to account for longer-distance constrains, instead of a few proceeding words.
Morphological Segmentation and Part of Speech Tagging for Religious Arabic
We annotate a small corpus of religious Arabic with morphological segmentation boundaries and fine-grained segment-based part of speech tags. Experiments on both segmentation and POS tagging show that the religious corpus-trained segmenter and POS tagger outperform the Arabic Treebak-trained ones although the latter is 21 times as big, which shows the need for building religious Arabic linguistic resources. The small corpus we annotate improves segmentation accuracy by 5% absolute (from 90.84% to 95.70%), and POS tagging by 9% absolute (from 82.22% to 91.26) when using gold standard segmentation, and by 9.6% absolute (from 78.62% to 88.22) when using automatic segmentation.
Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons
Lamia Hadrich Belguith
We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and we extract bilingual lexicons from inter-language links aligned with statistical method or a combined statistical and linguistic method.
Workshop on Monolingual Machine Translation
Improving English to Spanish Out-of-Domain Translations by Morphology Generalization and Generation
José B. Mariño
This paper presents a detailed study of a method for morphology generalization and generation to address out-of-domain translations in English-to-Spanish phrase-based MT. The paper studies whether the morphological richness of the target language causes poor quality translation when translating out-of-domain. In detail, this approach first translates into Spanish simplified forms and then predicts the final inflected forms through a morphology generation step based on shallow and deep-projected linguistic information available from both the source and target-language sentences. Obtained results highlight the importance of generalization, and therefore generation, for dealing with out-of-domain data.
Monolingual Data Optimisation for Bootstrapping SMT Engines
Content localisation via machine translation (MT) is a sine qua non, especially for international online business. While most applications utilise rule-based solutions due to the lack of suitable in-domain parallel corpora for statistical MT (SMT) training, in this paper we investigate the possibility of applying SMT where huge amounts of monolingual content only are available. We describe a case study where an analysis of a very large amount of monolingual online trading data from eBay is conducted by ALS with a view to reducing this corpus to the most representative sample in order to ensure the widest possible coverage of the total data set. Furthermore, minimal yet optimal sets of sentences/words/terms are selected for generation of initial translation units for future SMT system-building.
Shallow and Deep Paraphrasing for Improved Machine Translation Parameter Optimization
Dennis N. Mehay
String comparison methods such as BLEU (Papineni et al., 2002) are the de facto standard in MT evaluation (MTE) and in MT system parameter tuning (Och, 2003). It is difficult for these metrics to recognize legitimate lexical and grammatical paraphrases, which is important for MT system tuning (Madnani, 2010). We present two methods to address this: a shallow lexical substitution technique and a grammar-driven paraphrasing technique. Grammatically precise paraphrasing is novel in the context of MTE, and demonstrating its usefulness is a key contribution of this paper. We use these techniques to paraphrase a single reference, which, when used for parameter tuning, leads to superior translation performance over baselines that use only human-authored references.
Two stage Machine Translation System using Pattern-based MT and Phrase-based SMT
We have developed a two-stage machine translation (MT) system. The first stage consists of an automatically created pattern-based machine translation system (PBMT), and the second stage consists of a standard phrase-based statistical machine translation (SMT) system. We studied for the Japanese-English simple sentence task. First, we obtained English sentences from Japanese sentences using an automatically created Japanese-English pattern-based machine translation. We call the English sentences obtained in this way as “English”. Second, we applied a standard SMT (Moses) to the results. This means that we translated the “English” sentences into English by SMT. We also conducted ABX tests (Clark, 1982) to compare the outputs by the standard SMT (Moses) with those by the proposed system for 100 sentences. The experimental results indicated that 30 sentences output by the proposed system were evaluated as being better than those outputs by the standard SMT system, whereas 9 sentences output by the standard SMT system were thought to be better than those outputs by the proposed system. This means that our proposed system functioned effectively in the Japanese-English simple sentence task.
Improving Word Alignment by Exploiting Adapted Word Similarity
Septina Dian Larasati
This paper presents a method to improve a word alignment model in a phrase-based Statistical Machine Translation system for a low-resourced language using a string similarity approach. Our method captures similar words that can be seen as semi-monolingual across languages, such as numbers, named entities, and adapted/loan words. We use several string similarity metrics to measure the monolinguality of the words, such as Longest Common Subsequence Ratio (LCSR), Minimum Edit Distance Ratio (MEDR), and we also use a modified BLEU Score (modBLEU). Our approach is to add intersecting alignment points for word pairs that are orthographically similar, before applying a word alignment heuristic, to generate a better word alignment. We demonstrate this approach on Indonesian-to-English translation task, where the languages share many similar words that are poorly aligned given a limited training data. This approach gives a statistically significant improvement by up to 0.66 in terms of BLEU score.
Addressing some Issues of Data Sparsity towards Improving English- Manipuri SMT using Morphological Information
Thoudam Doren Singh
The performance of an SMT system heavily depends on the availability of large parallel corpora. Unavailability of these resources in the required amount for many language pair is a challenging issue. The required size of the resource involving morphologically rich and highly agglutinative language is essentially much more for the SMT systems. This paper investigates on some of the issues on enriching the resource for this kind of languages. Handling of inflectional and derivational morphemes of the morphologically rich target language plays important role in the enrichment process. Mapping from the source to the target side is carried out for the English-Manipuri SMT task using factored model. The SMT system developed shows improvement in the performance both in terms of the automatic scoring and subjective evaluation over the baseline system.
Statistical Machine Translation for Depassivizing German Part-of-speech Sequences
We aim to use statistical machine translation technology to correct grammar errors and style issues in monolingual text. Here, as a feasibility test, we focus on depassivization in German and we abstract from surface forms to parts of speech. Our results are not yet satisfactory but yield useful insights into directions for improvement.