Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper gives an overview of the evaluation campaign results of the International1Workshop on Spoken Language Translation (IWSLT) 2009 . In this workshop, we focused on the translation of task-oriented human dialogs in travel situations. The speech data was recorded through human interpreters, where native speakers of different languages were asked to complete certain travel-related tasks like hotel reservations using their mother tongue. The translation of the freely-uttered conversation was carried out by human interpreters. The obtained speech data was annotated with dialog and speaker information. The translation directions were English into Chinese and vice versa for the Challenge Task, and Arabic, Chinese, and Turkish, which is a new edition, into English for the standard BTEC Task. In total, 18 research groups participated in this year’s event. Automatic and subjective evaluations were carried out in order to investigate the impact of task-oriented human dialogs on automatic speech recognition (ASR) and machine translation (MT) system performance, as well as the robustness of state-of-the-art MT systems for speech-to-speech translation in a dialog scenario.
In this paper, we describe the techniques that are explored in the AppTek system to enhance the translations in the Turkish to English track of IWSLT09. The submission was generated using a phrase-based statistical machine translation system. We also researched the usage of morpho-syntactic information and the application of word reordering in order to improve the translation results. The results are evaluated based on BLEU and METEOR scores. We show that the usage of morpho-syntactic information yields 3 BLEU points gain in the overall system.
This paper describes the Barcelona Media SMT system in the IWSLT 2009 evaluation campaign. The Barcelona Media system is an statistical phrase-based system enriched with source context information. Adding source context in an SMT system is interesting to enhance the translation in order to solve lexical and structural choice errors. The novel technique uses a similarity metric among each test sentence and each training sentence. First experimental results of this technique are reported in the Arabic and Chinese Basic Traveling Expression Corpus (BTEC) task. Although working in a single domain, there are ambiguities in SMT translation units and slight improvements in BLEU are shown in both tasks (Zh2En and Ar2En).
In this paper, we give a description of the Machine Translation (MT) system developed at DCU that was used for our fourth participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2009). Two techniques are deployed in our system in order to improve the translation quality in a low-resource scenario. The first technique is to use multiple segmentations in MT training and to utilise word lattices in decoding stage. The second technique is used to select the optimal training data that can be used to build MT systems. In this year’s participation, we use three different prototype SMT systems, and the output from each system are combined using standard system combination method. Our system is the top system for Chinese–English CHALLENGE task in terms of BLEU score.
This paper reports on the participation of FBK at the IWSLT 2009 Evaluation. This year we worked on the Arabic-English and Turkish-English BTEC tasks with a special effort on linguistic preprocessing techniques involving morphological segmentation. In addition, we investigated the adaptation problem in the development of systems for the Chinese-English and English-Chinese challenge tasks; in particular, we explored different ways for clustering training data into topic or dialog-specific subsets: by producing (and combining) smaller but more focused models, we intended to make better use of the available training data, with the ultimate purpose of improving translation quality.
This year’s GREYC translation system is an improved translation memory that was designed from scratch to experiment with an approach whose goal is just to improve over the output of a standard translation memory by making heavy use of sub-sentential alignments in a restricted case of translation by analogy. The tracks the system participated in are all BTEC tracks: Arabic to English, Chinese to English, and Turkish to English.
In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, variational systems are explored. On top of both phrase-based and syntax-based single systems, we further use rescoring method to improve the individual system performance and use system combination method to combine the strengths of the different individual systems. Rescoring is applied on each single system output, and system combination is applied on all rescoring outputs. Finally, our system combination framework shows better performance in Chinese-English BTEC task.
This paper describes the ICT Statistical Machine Translation systems that used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2009. For this year’s evaluation, we participated in the Challenge Task (Chinese-English and English-Chinese) and BTEC Task (Chinese-English). And we mainly focus on one new method to improve single system’s translation quality. Specifically, we developed a sentence-similarity based development set selection technique. For each task, we finally submitted the single system who got the maximum BLEU scores on the selected development set. The four single translation systems are based on different techniques: a linguistically syntax-based system, two formally syntax-based systems and a phrase-based system. Typically, we didn’t use any rescoring or system combination techniques in this year’s evaluation.
This paper describes the LIG experiments in the context of IWSLT09 evaluation (Arabic to English Statistical Machine Translation task). Arabic is a morphologically rich language, and recent experimentations in our laboratory have shown that the performance of Arabic to English SMT systems varies greatly according to the Arabic morphological segmenters applied. Based on this observation, we propose to use simultaneously multiple segmentations for machine translation of Arabic. The core idea is to keep the ambiguity of the Arabic segmentation in the system input (using confusion networks or lattices). Then, we hope that the best segmentation will be chosen during MT decoding. The mathematics of this multiple segmentation approach are given. Practical implementations in the case of verbatim text translation as well as speech translation (outside of the scope of IWSLT09 this year) are proposed. Experiments conducted in the framework of IWSLT evaluation campaign show the potential of the multiple segmentation approach. The last part of this paper explains in detail the different systems submitted by LIG at IWSLT09 and the results obtained.
This paper describes the systems developed by the LIUM laboratory for the 2009 IWSLT evaluation. We participated in the Arabic and Chinese to English BTEC tasks. We developed three different systems: a statistical phrase-based system using the Moses toolkit, an Statistical Post-Editing system and a hierarchical phrase-based system based on Joshua. A continuous space language model was deployed to improve the modeling of the target language. These systems are combined by a confusion network based approach.
This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2009 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic and Turkish to English translation tasks. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2008 system, and experiments we ran during the IWSLT-2009 evaluation. Specifically, we focus on 1) Cross-domain translation using MAP adaptation and unsupervised training, 2) Turkish morphological processing and translation, 3) improved Arabic morphology for MT preprocessing, and 4) system combination methods for machine translation.
This paper describes the NICT SMT system used in the International Workshop on Spoken Language Translation (IWSLT) 2009 evaluation campaign. We participated in the Challenge Task. Our system was based on a fairly common phrase-based machine translation system. We used two methods for stabilizing MERT.
This paper reports on the participation of CASIA (Institute of Automation Chinese Academy of Sciences) at the evaluation campaign of the International Workshop on Spoken Language Translation 2009. We participated in the challenge tasks for Chinese-to-English and English-to-Chinese translation respectively and the BTEC task for Chinese-to-English translation only. For all of the tasks, system performance is improved with some special methods as follows: 1) combining different results of Chinese word segmentation, 2) combining different results of word alignments, 3) adding reliable bilingual words with high probabilities to the training data, 4) handling named entities including person names, location names, organization names, temporal and numerical expressions additionally, 5) combining and selecting translations from the outputs of multiple translation engines, 6) replacing Chinese character with Chinese Pinyin to train the translation model for Chinese-to-English ASR challenge task. This is a new approach that has never been introduced before.
We describe the system developed by the team of the National University of Singapore for the Chinese-English BTEC task of the IWSLT 2009 evaluation campaign. We adopted a state-of-the-art phrase-based statistical machine translation approach and focused on experiments with different Chinese word segmentation standards. In our official submission, we trained a separate system for each segmenter and we combined the outputs in a subsequent re-ranking step. Given the small size of the training data, we further re-trained the system on the development data after tuning. The evaluation results show that both strategies yield sizeable and consistent improvements in translation quality.
We present the UOT Machine Translation System that was used in the IWSLT-09 evaluation campaign. This year, we participated in the BTEC track for Chinese-to-English translation. Our system is based on a string-to-tree framework. To integrate deep syntactic information, we propose the use of parse trees and semantic dependencies on English sentences described respectively by Head-driven Phrase Structure Grammar and Predicate-Argument Structures. We report the results of our system on both the development and test sets.
We have developed a two-stage machine translation (MT) system. The first stage is a rule-based machine translation system. The second stage is a normal statistical machine translation system. For Chinese-English machine translation, first, we used a Chinese-English rule-based MT, and we obtained ”ENGLISH” sentences from Chinese sentences. Second, we used a standard statistical machine translation. This means that we translated ”ENGLISH” to English machine translation. We believe this method has two advantages. One is that there are fewer unknown words. The other is that it produces structured or grammatically correct sentences. From the results of experiments, we obtained a BLEU score of 0.3151 in the BTEC-CE task using our proposed method. In contrast, we obtained a BLEU score of 0.3311 in the BTEC-CE task using a standard method (moses). This means that our proposed method was not as effective for the BTEC-CE task. Therefore, we will try to improve the performance by optimizing parameters.
We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses linguistic morphological analysis and statistical disambiguation to generate morpheme-based translation models, which is the approach with the better translation performance. One of the contrastive submissions utilizes unsupervised subword segmentation to generate non-linguistic subword-based translation models, while another contrastive system uses word-based models but makes use of lexical approximation to cope with out-of-vocabulary words, similar to the approach in our Arabic-to-English submission.
In this paper, we describe the machine translation system developed at the Polytechnic University of Valencia, which was used in our participation in the International Workshop on Spoken Language Translation (IWSLT) 2009. We have taken part only in the Chinese-English BTEC Task. In the evaluation campaign, we focused on the use of our hybrid translation system over the provided corpus and less effort was devoted to the use of preand post-processing techniques that could have improved the results. Our decoder is a hybrid machine translation system that combines phrase-based models together with syntax-based translation models. The syntactic formalism that underlies the whole decoding process is a Chomsky Normal Form Stochastic Inversion Transduction Grammar (SITG) with phrasal productions and a log-linear combination of probability models. The decoding algorithm is a CYK-like algorithm that combines the translated phrases inversely or directly in order to get a complete translation of the input sentence.
This paper describes the University of Washington’s system for the 2009 International Workshop on Spoken Language Translation (IWSLT) evaluation campaign. Two systems were developed, one each for the BTEC Chinese-to-English and Arabic-to-English tracks. We describe experiments with different preprocessing and alignment combination schemes. Our main focus this year was on exploring a novel semi-supervised approach to N-best list reranking; however, this method yielded inconclusive results.