Machine Translation Summit (2011)



up

bib (full) Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Methods for Smoothing the Optimizer Instability in SMT
Mauro Cettolo | Nicola Bertoldi | Marcello Federico

pdf bib
Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances
Aaron Phillips | Ralf Brown

pdf bib
Maximum Rank Correlation Training for Statistical Machine Translation
Daqi Zheng | Yifan He | Yang Liu | Qun Liu

pdf
POS Tagging of English Particles for Machine Translation
Jianjun Ma | Degen Huang | Haixia Liu | Wenfeng Sheng

pdf
Multi-stage Chinese Dependency Parsing Based on Dependency Direction
Wenjing Lang | Qiaoli Zhou | Guiping Zhang | Dongfeng Cai

pdf
Statistic Machine Translation Boosted with Spurious Word Deletion
Shujie Liu | Chi-Ho Li | Ming Zhou

pdf
Phonetic Representation-Based Speech Translation
Jie Jiang | Zeeshan Ahmed | Julie Carson-Berndsen | Peter Cahill | Andy Way

pdf
Unsupervised Vocabulary Selection for Domain-Independent Simultaneous Lecture Translation
Paul Maergner | Ian Lane | Alex Waibel

pdf
Context-aware Language Modeling for Conversational Speech Translation
Avneesh Saluja | Ian Lane | Ying Zhang

pdf
Incremental Training and Intentional Over-fitting of Word Alignment
Qin Gao | Will Lewis | Chris Quirk | Mei-Yuh Hwang

pdf
Alignment Inference and Bayesian Adaptation for Machine Translation
Kevin Duh | Katsuhito Sudoh | Tomoharu Iwata | Hajime Tsukada

pdf
Multi-Strategy Approaches to Active Learning for Statistical Machine Translation
Vamshi Ambati | Stephan Vogel | Jaime Carbonell

pdf
Document-level Consistency Verification in Machine Translation
Tong Xiao | Jingbo Zhu | Shujie Yao | Hao Zhang

pdf
Function Word Generation in Statistical Machine Translation Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou

pdf
Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users
Miquel Esplà-Gomis | Víctor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz

pdf
Automatic Post-Editing based on SMT and its selective application by Sentence-Level Automatic Quality Evaluation
Hirokazu Suzuki

pdf
Qualitative Analysis of Post-Editing for High Quality Machine Translation
Frédéric Blain | Jean Senellart | Holger Schwenk | Mirko Plitt | Johann Roturier

pdf
Using machine translation in computer-aided translation to suggest the target-side words to change
Miquel Esplà-Gomis | Felipe Sánchez-Martínez | Mikel L. Forcada

pdf
A Unified SMT Framework Combining MIRA and MERT
Shujie Liu | Chi-Ho Li | Ming Zhou

pdf
Improving Phrase Extraction via MBR Phrase Scoring and Pruning
Nan Duan | Mu Li | Ming Zhou | Lei Cui

pdf
Phrase Segmentation Model using Collocation and Translational Entropy
Hyoung-Gyu Lee | Joo-Young Lee | Min-Jeong Kim | Hae-Chang Rim | Joong-Hwi Shin | Young-Sook Hwang

pdf
Singular or Plural? Exploiting Parallel Corpora for Chinese Number Prediction
Elizabeth Baran | Nianwen Xue

pdf
Handling Multiword Expressions in Phrase-Based Statistical Machine Translation
Santanu Pal | Tanmoy Chakraborty | Sivaji Bandyopadhyay

pdf
Automatic Error Analysis for Morphologically Rich Languages
Ahmed El Kholy | Nizar Habash

pdf
MT use within the enterprise: Encouraging adoption via a unified MT API
Raymond Flournoy

pdf
Deploying MT into a Localisation Workflow: Pains and Gains
Yanli Sun | Juan Liu | Yi Li

pdf
Evaluation of MT Systems to Translate User Generated Content
Johann Roturier | Anthony Bensadoun

pdf
A Unified and Discriminative Soft Syntactic Constraint Model for Hierarchical Phrase-based Translation
Lemao Liu | Tiejun Zhao | Chao Wang | Hailong Cao

pdf
Simple but Effective Approaches to Improving Tree-to-tree Model
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong

pdf
Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables
Boxing Chen | Roland Kuhn | George Foster | Howard Johnson

pdf
Identification and Translation of Significant Patterns for Cross-Domain SMT Applications
Han-Bin Chen | Hen-Hsen Huang | Jengwei Tjiu | Ching-Ting Tan | Hsin-Hsi Chen

pdf
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component Level Mixture Modelling
Pratyush Banerjee | Sudip Kumar Naskar | Johann Roturier | Andy Way | Josef van Genabith

pdf
Bagging-based System Combination for Domain Adaption
Linfeng Song | Haitao Mi | Yajuan Lü | Qun Liu

pdf
Extracting Pre-ordering Rules from Chunk-based Dependency Trees for Japanese-to-English Translation
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata

pdf
Statistical Post-Editing for a Statistical MT System
Hanna Bechara | Yanjun Ma | Josef van Genabith

pdf
Post-ordering in Statistical Machine Translation
Katsuhito Sudoh | Xianchao Wu | Kevin Duh | Hajime Tsukada | Masaaki Nagata

pdf
Searching Translation Memories for Paraphrases
Masao Utiyama | Graham Neubig | Takashi Onishi | Eiichiro Sumita

pdf
Are numbers good enough for you? - A linguistically meaningful MT evaluation method
Takako Aikawa | Spencer Rarrick

pdf
Marker-based Chunking for Analogy-based Translation of Chunks
Kota Takeya | Yves Lepage

pdf
A Comparison of Unsupervised Bilingual Term Extraction Methods Using Phrase-Tables
Masamichi Ideue | Kazuhide Yamamoto | Masao Utiyama | Eiichiro Sumita

pdf
Improving Low-Resource Statistical Machine Translation with a Novel Semantic Word Clustering Algorithm
Jeff Ma | Spyros Matsoukas | Richard Schwartz

pdf
Multi-granularity Word Alignment and Decoding for Agglutinative Language Translation
Zhiyang Wang | Yajuan Lü | Qun Liu

pdf
Improving the Hierarchical Phrase-Based Translation Model
Xiaodong Shi | Xiang Zhu | Yidong Chen

pdf
Lexical-based Reordering Model for Hierarchical Phrase-based Machine Translation
Zhongguang Zheng | Yao Meng | Hao Yu

pdf
Effective Use of Discontinuous Phrases for Hierarchical Phrase-based Translation
Wei Wei | Bo Xu

pdf
Generating Virtual Parallel Corpus: A Compatibility Centric Method
Jia Xu | Weiwei Sun

pdf
Parallel Corpus Refinement as an Outlier Detection Algorithm
Kaveh Taghipour | Shahram Khadivi | Jia Xu

pdf
MT Detection in Web-Scraped Parallel Corpora
Spencer Rarrick | Chris Quirk | Will Lewis

pdf
On the Expressivity of Linear Transductions
Markus Saers | Dekai Wu | Chris Quirk

pdf
Handheld Machine Translation System Based on Constraint Synchronous Grammar
Fai Wong | Francisco Oliveira | Sam Chao | Chi-Wai Tang

pdf
A Comparison Study of Parsers for Patent Machine Translation
Isao Goto | Masao Utiyama | Takashi Onishi | Eiichiro Sumita

pdf
Rich Linguistic Features for Translation Memory-Inspired Consistent Translation
Yifan He | Yanjun Ma | Andy Way | Josef van Genabith

pdf
Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi

pdf
The Cultivation of a Chinese-English-Japanese Trilingual Parallel Corpus from Comparable Patents
Bin Lu | Ka Po Chow | Benjamin K. Tsou

pdf
Evaluation Methodology and Results for English-to-Arabic MT
Olivier Hamon | Khalid Choukri

pdf
Example-Based Machine Translation for Low-Resource Language Using Chunk-String Templates
Md. Anwarus Salam Khan | Setsuo Yamada | Tetsuro Nishino

pdf
Improve SMT with Source-Side “Topic-Document” Distributions
Zhengxian Gong | Guodong Zhou | Liangyou Li

pdf
Predicting Machine Translation Adequacy
Lucia Specia | Najeh Hajlaoui | Catalina Hallett | Wilker Aziz

pdf
Getting Expert Quality from the Crowd for Machine Translation Evaluation
Luisa Bentivogli | Marcello Federico | Giovanni Moretti | Michael Paul

pdf
A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints
Sudip Kumar Naskar | Antonio Toral | Federico Gaspari | Andy Way

pdf
Comparative Evaluation of Term Informativeness Measures in Machine Translation Evaluation Metrics
Billy Wong | Chunyu Kit

pdf
System Combination for Machine Translation Based on Text-to-Text Generation
Wei-Yun Ma | Kathleen Mckeown

pdf
Hybrid Machine Translation Guided by a Rule–Based System
Cristina España-Bonet | Gorka Labaka | Arantza Díaz de Ilarraza | Lluís Màrquez

pdf
Integrating shallow-transfer rules into phrase-based statistical machine translation
Víctor M. Sánchez-Cartagena | Felipe Sánchez-Martínez | Juan Antonio Pérez-Ortiz

pdf
Hypergraph Training and Decoding of System Combination in SMT
Yupeng Liu | Tiejun Zhao | Sheng Li

pdf
Study on the Impact Factors of the Translators’ Post-editing Efficiency in a Collaborative Translation Environment
Na Ye | Guiping Zhang

pdf
UTX 1.11, a Simple and Open User Dictionary/Terminology Standard, and its Effectiveness with Multiple MT Systems
Seiji Okura | Yuji Yamamoto | Hajime Ito | Michael Kato | Miwako Shimazu

pdf
Real-time Multi-media Translation for Healthcare: a Usability Study
Mark Seligman | Mike Dillinger



up

bib (full) Proceedings of Machine Translation Summit XIII: Tutorial Abstracts

pdf bib
Syntactic SMT and Semantic SMT
Dekai Wu

Over the past twenty years, we have attacked the historical methodological barriers between statistical machine translation and traditional models of syntax, semantics, and structure. In this tutorial, we will survey some of the central issues and techniques from each of these aspects, with an emphasis on `deeply theoretically integrated' models, rather than hybrid approaches such as superficial statistical aggregation or system combination of outputs produced by traditional symbolic components. On syntactic SMT, we will explore the trade-offs for SMT between learnability and representational expressiveness. After establishing a foundation in the theory and practice of stochastic transduction grammars, we will examine very recent new approaches to automatic unsupervised induction of various classes of transduction grammars. We will show why stochastic linear transduction grammars (LTGs and LITGs) and their preterminalized variants (PLITGs) are proving to be particularly intriguing models for the bootstrapping of inducing full-fledged stochastic inversion transduction grammars (ITGs). On semantic SMT, we will explore the trade-offs for SMT involved in applying various lexical semantics models. We will first examine word sense disambiguation, and discuss why traditional WSD models that are not deeply integrated within the SMT model tend, surprisingly, to fail. In contrast, we will show how a deeply embedded phrase sense disambiguation (PSD) approach succeeds where traditional WSD does not. We will then turn to semantic role labeling, and discuss the challenges of early approaches of applying SRL models to SMT. Finally, on semantic MT evaluation, we will explore some very new human and semi-automatic metrics based on semantic frame agreement. We show that by keeping the metrics deeply grounded within the theoretical framework of semantic frames, the new HMEANT and MEANT metrics can significantly outperform even the state-of-the-art expensive HTER and TER metrics, while at the same time maintaining the desirable characteristics of simplicity, inexpensiveness, and representational transparency.

pdf bib
From the Confidence Estimation of Machine Translation to the Integration of MT and Translation Memory
Yanjun Ma | Yifan He | Josef van Genabith

In this tutorial, we cover techniques that facilitate the integration of Machine Translation (MT) and Translation Memory (TM), which can help the adoption of MT technology in localisation industry. The tutorial covers four parts: i) brief introduction of MT and TM systems, ii) MT confidence estimation measures tailored for the TM environment, iii) segment-level MT and MT integration, iv) sub-segment level MT and TM integration, and v) human evaluation of MT and TM integration. We will first briefly describe and compare how translations are generated in MT and TM systems, and suggest possible avenues to combines these two systems. We will also cover current quality / cost estimation measures applied in MT and TM systems, such as the fuzzy-match score in the TM, and the evaluation/confidence metrics used to judge MT outputs. We then move on to introduce the recent developments in the field of MT confidence estimation tailored towards predicting post-editing efforts. We will especially focus on the confidence metrics proposed by Specia et al., which is shown to have high correlation with human preference, as well as post-editing time. For segment-level MT and TM integration, we present translation recommendation and translation re-ranking models, where the integration happens at the 1-best or the N-best level, respectively. Given an input to be translated, MT-TM recommendation compares the output from the MT and the TM systems, and presents the better one to the post-editor. MT-TM re-ranking, on the other hand, combines k-best lists from both systems, and generates a new list according to estimated post-editing effort. We observe high precision of these models in automatic and human evaluations, indicating that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate. For sub-segment level MT and TM integration, we try to reuse high quality TM chunks to improve the quality of MT systems. We can also predict whether phrase pairs derived from fuzzy matches should be used to constrain the translation of an input segment. Using a series of linguistically- motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, as is measured by automatic evaluation scores. Finally, we present several methodologies that can be used to track post-editing effort, perform human evaluation of MT-TM integration, or help translators to access MT outputs in a TM environment.

pdf bib
Evaluating the Output of Machine Translation Systems
Alon Lavie

This half-day tutorial provides a broad overview of how to evaluate translations that are produced by machine translation systems. The range of issues covered includes a broad survey of both human evaluation measures and commonly-used automated metrics, and a review of how these are used for various types of evaluation tasks, such as assessing the translation quality of MT-translated sentences, comparing the performance of alternative MT systems, or measuring the productivity gains of incorporating MT into translation workflows.

pdf
Productive Use of MT in Localization
Mirko Plitt

Localization is a term mainly used in the software industry to designate the adaptation of products to meet local market needs. At the center of this process lies the translation of the most visible part of the product – the user interface – and the product documentation. Not surprisingly, the localization industry has therefore long been an extensive consumer of translation technology and a key contributor to its progress. Software products are typically released in recurrent cycles, with large amounts of content remaining unchanged or undergoing only minor modifications from one release to the next. In addition, software development cycles are short, forcing translation to start while the product is still undergoing changes, so that localized products can reach global markets in a timely fashion. These two aspects result in a heavy dependency on the efficient handling of translation updates. It is only natural that the software industry turned to software-based productivity tools to automate the recycling of translations (through translation memories) and to support the management of the translation workflow (through translation management systems). Machine translation is a relatively recent addition to the localization technology mix, and not yet as widely adopted as one would expect. Its initial use in the software industry was for more accessory content which is otherwise often left untranslated, e.g. product support articles and antivirus alerts with their short lifecycle. The expectation had however always been that MT could one day be deployed on the bulk of user interface and product documentation, due to the expected process efficiencies and cost savings. While MT is generally still not considered “good” enough to be used raw on this type of content, it has now become an integral part of translation productivity environments, thereby transforming translators into post-editors. The tutorial will provide an overview of current localization practices and challenges, with a special focus on the role of translation memory and translation management technologies. As a use case of the integration of MT in such an environment, we will then present the approach taken by Autodesk with its large set of Moses engines trained on custom data. Finally, we will explore typical scenarios in which machine translation is employed in the localization industry, using practical examples and data gathered in different productivity and usability tests.