Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

Marcello Federico, Sebastian Stüker, François Yvon (Editors)

Anthology ID:
December 4-5
Lake Tahoe, California
Bib Export formats:

pdf bib
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign
Marcello Federico | Sebastian Stüker | François Yvon

pdf bib
Report on the 11th IWSLT evaluation campaign
Mauro Cettolo | Jan Niehues | Sebastian Stüker | Luisa Bentivogli | Marcello Federico

The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The 2014 evaluation offered multiple tracks on lecture transcription and translation based on the TED Talks corpus. In particular, this year IWSLT included three automatic speech recognition tracks, on English, German and Italian, five speech translation tracks, from English to French, English to German, German to English, English to Italian, and Italian to English, and five text translation track, also from English to French, English to German, German to English, English to Italian, and Italian to English. In addition to the official tracks, speech and text translation optional tracks were offered, globally involving 12 other languages: Arabic, Spanish, Portuguese (B), Hebrew, Chinese, Polish, Persian, Slovenian, Turkish, Dutch, Romanian, Russian. Overall, 21 teams participated in the evaluation, for a total of 76 primary runs submitted. Participants were also asked to submit runs on the 2013 test set (progress test set), in order to measure the progress of systems with respect to the previous year. All runs were evaluated with objective metrics, and submissions for two of the official text translation tracks were also evaluated with human post-editing.

pdf bib
FBK @ IWSLT 2014 – ASR track
B. BabaAli | R. Serizel | S. Jalalvand | R. Gretter | D. Giuliani

This paper reports on the participation of FBK in the IWSLT 2014 evaluation campaign for Automatic Speech Recognition (ASR), which focused on the transcription of TED talks. The outputs of primary and contrastive systems were submitted for three languages, namely English, German and Italian. Most effort went into the development of the English transcription system. The primary system is based on the ROVER combination of the output of 5 transcription subsystems which are all based on the Deep Neural Network Hidden Markov Model (DNN-HMM) hybrid. Before combination, word lattices generated by each sub-system are rescored using an efficient interpolation of 4-gram and Recurrent Neural Network (RNN) language models. The primary system achieves a Word Error Rate (WER) of 14.7% and 11.4% on the 2013 and 2014 official IWSLT English test sets, respectively. The subspace Gaussian mixture model (SGMM) system developed for German achieves 39.5% WER on the 2014 IWSLT German test sets. For Italian, the primary transcription system was based on hidden Markov models and achieves 23.8% WER on the 2014 IWSLT Italian test set.

The UEDIN ASR systems for the IWSLT 2014 evaluation
Peter Bell | Pawel Swietojanski | Joris Driesen | Mark Sinclair | Fergus McInnes | Steve Renals

This paper describes the University of Edinburgh (UEDIN) ASR systems for the 2014 IWSLT Evaluation. Notable features of the English system include deep neural network acoustic models in both tandem and hybrid configuration with the use of multi-level adaptive networks, LHUC adaptation and Maxout units. The German system includes lightly supervised training and a new method for dictionary generation. Our voice activity detection system now uses a semi-Markov model to incorporate a prior on utterance lengths. There are improvements of up to 30% relative WER on the tst2013 English test set.

Improving MEANT based semantically tuned SMT
Meriem Beloucif | Chi-kiu Lo | Dekai Wu

We discuss various improvements to our MEANT tuned system, previously presented at IWSLT 2013. In our 2014 system, we incorporate this year’s improved version of MEANT, improved Chinese word segmentation, Chinese named entity recognition and dedicated proper name translation, and number expression handling. This results in a significant performance jump compared to last year’s system. We also ran preliminary experiments on tuning to IMEANT, our new ITG based variant of MEANT. The performance of tuning to IMEANT is comparable to tuning on MEANT (differences are statistically insignificant). We are presently investigating if tuning on IMEANT can produce even better results, since IMEANT was actually shown to correlate with human adequacy judgment more closely than MEANT. Finally, we ran experiments applying our new architectural improvements to a contrastive system tuned to BLEU. We observed a slightly higher jump in comparison to last year, possibly due to mismatches of MEANT’s similarity models to our new entity handling.

FBK’s machine translation and speech translation systems for the IWSLT 2014 evaluation campaign
Nicola Bertoldi | Prashanu Mathur | Nicolas Ruiz | Marcello Federico

This paper describes the systems submitted by FBK for the MT and SLT tracks of IWSLT 2014. We participated in the English-French and German-English machine translation tasks, as well as the English-French speech translation task. We report improvements in our English-French MT systems over last year’s baselines, largely due to improved techniques of combining translation and language models, and using huge language models. For our German-English system, we experimented with a novel domain adaptation technique. For both language pairs we also applied a novel word triggerbased model which shows slight improvements on EnglishFrench and German-English systems. Our English-French SLT system utilizes MT-based punctuation insertion, recasing, and ASR-like synthesized MT training data.

Edinburgh SLT and MT system description for the IWSLT 2014 evaluation
Alexandra Birch | Matthias Huck | Nadir Durrani | Nikolay Bogoychev | Philipp Koehn

This paper describes the University of Edinburgh’s spoken language translation (SLT) and machine translation (MT) systems for the IWSLT 2014 evaluation campaign. In the SLT track, we participated in the German↔English and English→French tasks. In the MT track, we participated in the German↔English, English→French, Arabic↔English, Farsi→English, Hebrew→English, Spanish↔English, and Portuguese-Brazil↔English tasks. For our SLT submissions, we experimented with comparing operation sequence models with bilingual neural network language models. For our MT submissions, we explored using unsupervised transliteration for languages which have a different script than English, in particular for Arabic, Farsi, and Hebrew. We also investigated syntax-based translation and system combination.

Combined spoken language translation
Markus Freitag | Joern Wuebker | Stephan Peitz | Hermann Ney | Matthias Huck | Alexandra Birch | Nadir Durrani | Philipp Koehn | Mohammed Mediani | Isabel Slawik | Jan Niehues | Eunach Cho | Alex Waibel | Nicola Bertoldi | Mauro Cettolo | Marcello Federico

EU-BRIDGE is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German→English spoken language translation (SLT) track as well as to the German→English, English→German and English→French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.

The MITLL-AFRL IWSLT 2014 MT system
Michaeel Kazi | Elizabeth Salesky | Brian Thompson | Jessica Ray | Michael Coury | Tim Anderson | Grant Erdmann | Jeremy Gwinnup | Katherine Young | Brian Ore | Michael Hutt

This report summarizes the MITLL-AFRL MT and ASR systems and the experiments run using them during the 2014 IWSLT evaluation campaign. Our MT system is much improved over last year, owing to integration of techniques such as PRO and DREM optimization, factored language models, neural network joint model rescoring, multiple phrase tables, and development set creation. We focused our eforts this year on the tasks of translating from Arabic, Russian, Chinese, and Farsi into English, as well as translating from English to French. ASR performance also improved, partly due to increased eforts with deep neural networks for hybrid and tandem systems. Work focused on both the English and Italian ASR tasks.

The 2014 KIT IWSLT speech-to-text systems for English, German and Italian
Kevin Kilgour | Michael Heck | Markus Müller | Matthias Sperber | Sebastian Stüker | Alex Waibel

This paper describes our German, Italian and English Speech-to-Text (STT) systems for the 2014 IWSLT TED ASR track. Our setup uses ROVER and confusion network combination from various subsystems to achieve a good overall performance. The individual subsystems are built by using different front-ends, (e.g., MVDR-MFCC or lMel), acoustic models (GMM or modular DNN) and phone sets and by training on various subsets of the training data. Decoding is performed in two stages, where the GMM systems are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual subsystems.

A topic-based approach for post-processing correction of automatic translations
Mohamed Morchid | Stéphane Huet | Richard Dufour

We present the LIA systems for the machine translation evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2014 for the English-to-Slovene and English-to-Polish translation tasks. The proposed approach takes into account word context; first, it maps sentences into a latent Dirichlet allocation (LDA) topic space, then it chooses from this space words that are thematically and grammatically close to mistranslated words. This original post-processing approach is compared with a factored translation system built with MOSES. While this postprocessing method does not allow us to achieve better results than a state-of-the-art system, this should be an interesting way to explore, for example by adding this topic space information at an early stage in the translation process.

The USFD SLT system for IWSLT 2014
Raymond W. M. Ng | Mortaza Doulaty | Rama Doddipatla | Wilker Aziz | Kashif Shah | Oscar Saz | Madina Hasan | Ghada AlHaribi | Lucia Specia | Thomas Hain

The University of Sheffield (USFD) participated in the International Workshop for Spoken Language Translation (IWSLT) in 2014. In this paper, we will introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is achieved by two multi-pass deep neural network systems with adaptation and rescoring techniques. Machine translation (MT) is achieved by a phrase-based system. The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data. The USFD contrastive systems explore the integration of ASR and MT by using a quality estimation system to rescore the ASR outputs, optimising towards better translation. This gives a further 0.54 and 0.26 BLEU improvement respectively on the IWSLT 2012 and 2014 evaluation data.

The speech recognition systems of IOIT for IWSLT 2014
Quoc Bao Nguyen | Tat Thang Vu | Chi Mai Luong

This paper describes the speech recognition systems of IOIT for IWSLT 2014 TED ASR track. This year, we focus on improving acoustic model for the systems using two main approaches of deep neural network which are hybrid and bottleneck feature systems. These two subsystems are combined using lattice Minimum Bayes-Risk decoding. On the 2013 evaluations set, which serves as a progress test set, we were able to reduce the word error rate of our transcription systems from 27.2% to 24.0%, a relative reduction of 11.7%.

Phrase-based language modelling for statistical machine translation
Achraf Ben Romdhane | Salma Jamoussi | Abdelmajid Ben Hamadou | Kamel Smaïli

In this paper, we present our submitted MT system for the IWSLT2014 Evaluation Campaign. We participated in the English-French translation task. In this article we focus on one of the most important component of SMT: the language model. The idea is to use a phrase-based language model. For that, sequences from the source and the target language models are retrieved and used to calculate a phrase n-gram language model. These phrases are used to rewrite the parallel corpus which is then used to calculate a new translation model.

LIUM English-to-French spoken language translation system and the Vecsys/LIUM automatic speech recognition system for Italian language for IWSLT 2014
Anthony Rousseau | Loïc Barrault | Paul Deléglise | Yannick Estève | Holger Schwenk | Samir Bennacef | Armando Muscariello | Stephan Vanni

This paper describes the Spoken Language Translation system developed by the LIUM for the IWSLT 2014 evaluation campaign. We participated in two of the proposed tasks: (i) the Automatic Speech Recognition task (ASR) in two languages, Italian with the Vecsys company, and English alone, (ii) the English to French Spoken Language Translation task (SLT). We present the approaches and specificities found in our systems, as well as the results from the evaluation campaign.

LIMSI English-French speech translation system
Natalia Segal | Hélène Bonneau-Maynard | Quoc Khanh Do | Alexandre Allauzen | Jean-Luc Gauvain | Lori Lamel | François Yvon

This paper documents the systems developed by LIMSI for the IWSLT 2014 speech translation task (English→French). The main objective of this participation was twofold: adapting different components of the ASR baseline system to the peculiarities of TED talks and improving the machine translation quality on the automatic speech recognition output data. For the latter task, various techniques have been considered: punctuation and number normalization, adaptation to ASR errors, as well as the use of structured output layer neural network models for speech data.

The NCT ASR system for IWSLT 2014
Peng Shen | Yugang Lu | Xinhui Hu | Naoyuki Kanda | Masahiro Saiko | Chiori Hori

This paper describes our automatic speech recognition system for IWSLT2014 evaluation campaign. The system is based on weighted finite-state transducers and a combination of multiple subsystems which consists of four types of acoustic feature sets, four types of acoustic models, and N-gram and recurrent neural network language models. Compared with our system used in last year, we added additional subsystems based on deep neural network modeling on filter bank feature and convolutional deep neural network modeling on filter bank feature with tonal features. In addition, modifications and improvements on automatic acoustic segmentation and deep neural network speaker adaptation were applied. Compared with our last year’s system on speech recognition experiments, our new system achieved 21.5% relative improvement on word error rate on the 2013 English test data set.

The KIT translation systems for IWSLT 2014
Isabel Slawik | Mohammed Mediani | Jan Niehues | Yuqi Zhang | Eunah Cho | Teresa Herrmann | Thanh-Le Ha | Alex Waibel

In this paper, we present the KIT systems participating in the TED translation tasks of the IWSLT 2014 machine translation evaluation. We submitted phrase-based translation systems for all three official directions, namely English→German, German→English, and English→French, as well as for the optional directions English→Chinese and English→Arabic. For the official directions we built systems both for the machine translation as well as the spoken language translation track. This year we improved our systems’ performance over last year through n-best list rescoring using neural network-based translation and language models and novel preordering rules based on tree information of multiple syntactic levels. Furthermore, we could successfully apply a novel phrase extraction algorithm and transliteration of unknown words for Arabic. We also submitted a contrastive system for German→English built with stemmed German adjectives. For the SLT tracks, we used a monolingual translation system to translate the lowercased ASR hypotheses with all punctuation stripped to truecased, punctuated output as a preprocessing step to our usual translation system.

NTT-NAIST syntax-based SMT systems for IWSLT 2014
Katsuhito Sudoh | Graham Neubig | Kevin Duh | Katsuhiko Hayashi

This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2014 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems using the forest-to-string, syntactic preordering, and phrase-based translation formalisms. Individual systems employ training data selection for domain adaptation, truecasing, compound word splitting (for GermanEnglish), interpolated n-gram language models, and hypotheses rescoring using recurrent neural network language models.

The USTC machine translation system for IWSLT 2014
Shijin Wang | Yuguang Wang | Jianfeng Li | Yiming Cui | Lirong Dai

The NICT translation system for IWSLT 2014
Xiaolin Wang | Andrew Finch | Masao Utiyama | Taro Watanabe | Eiichiro Sumita

This paper describes NICT’s participation in the IWSLT 2014 evaluation campaign for the TED Chinese-English translation shared-task. Our approach used a combination of phrase-based and hierarchical statistical machine translation (SMT) systems. Our focus was in several areas, specifically system combination, word alignment, and various language modeling techniques including the use of neural network joint models. Our experiments on the test set from the 2013 shared task, showed that an improvement in BLEU score can be gained in translation performance through all of these techniques, with the largest improvements coming from using large data sizes to train the language model.

Polish-English speech statistical machine translation systems for the IWSLT 2014
Krzysztof Wolk | Krzysztof Marasek

This research explores effects of various training settings between Polish and English Statistical Machine Translation systems for spoken language. Various elements of the TED parallel text corpora for the IWSLT 2014 evaluation campaign were used as the basis for training of language models, and for development, tuning and testing of the translation system as well as Wikipedia based comparable corpora prepared by us. The BLEU, NIST, METEOR and TER metrics were used to evaluate the effects of data preparations on translation results. Our experiments included systems, which use lemma and morphological information on Polish words. We also conducted a deep analysis of provided Polish data as preparatory work for the automatic data correction and cleaning phase.

The RWTH Aachen machine translation systems for IWSLT 2014
Joern Wuebker | Stephan Peitz | Andreas Guta | Hermann Ney

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2014. We participated in both the MT and SLT tracks for the English→French and German→English language pairs and applied the identical training pipeline and models on both language pairs. Our state-of-the-art phrase-based baseline systems are augmented with maximum expected BLEU training for phrasal, lexical and reordering models. Further, we apply rescoring with novel recurrent neural language and translation models. The same systems are used for the SLT track, where we additionally perform punctuation prediction on the automatic transcriptions employing hierarchical phrase-based translation. We are able to improve RWTH’s 2013 evaluation systems by 1.7-1.8% BLEU absolute.