Arne Mauser

2012

pdf
Deciphering Foreign Language by Combining Language Models and Context Vectors
Malte Nuhn | Arne Mauser | Hermann Ney
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Forced Derivations for Hierarchical Machine Translation
Stephan Peitz | Arne Mauser | Joern Wuebker | Hermann Ney
Proceedings of COLING 2012: Posters

2011

pdf abs
Modeling punctuation prediction as machine translation
Stephan Peitz | Markus Freitag | Arne Mauser | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

Punctuation prediction is an important task in Spoken Language Translation. The output of speech recognition systems does not typically contain punctuation marks. In this paper we analyze different methods for punctuation prediction and show improvements in the quality of the final translation output. In our experiments we compare the different approaches and show improvements of up to 0.8 BLEU points on the IWSLT 2011 English French Speech Translation of Talks task using a translation system to translate from unpunctuated to punctuated text instead of a language model based punctuation prediction method. Furthermore, we do a system combination of the hypotheses of all our different approaches and get an additional improvement of 0.4 points in BLEU.

2010

pdf
Training Phrase Translation Models with Leaving-One-Out
Joern Wuebker | Arne Mauser | Hermann Ney
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf abs
A Source-side Decoding Sequence Model for Statistical Machine Translation
Minwei Feng | Arne Mauser | Hermann Ney
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We propose a source-side decoding sequence language model for phrase-based statistical machine translation. This model is a reordering model in the sense that it helps the decoder find the correct decoding sequence. The model uses word-aligned bilingual training data. We show improved translation quality of up to 1.34% BLEU and 0.54% TER using this model compared to three other widely used reordering models.

2009

pdf
Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models
Arne Mauser | Saša Hasan | Hermann Ney
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf abs
Automatic Evaluation Measures for Statistical Machine Translation System Optimization
Arne Mauser | Saša Hasan | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no single correct translation. In the extreme case, two translations of the same input can have completely different words and sentence structure while still both being perfectly valid. Large projects and competitions for MT research raised the need for reliable and efficient evaluation of MT systems. For the funding side, the obvious motivation is to measure performance and progress of research. This often results in a specific measure or metric taken as primarily evaluation criterion. Do improvements in one measure really lead to improved MT performance? How does a gain in one evaluation metric affect other measures? This paper is going to answer these questions by a number of experiments.

RWTH’s system for the 2008 IWSLT evaluation consists of a combination of different phrase-based and hierarchical statistical machine translation systems. We participated in the translation tasks for the Chinese-to-English and Arabic-to-English language pairs. We investigated different preprocessing techniques, reordering methods for the phrase-based system, including reordering of speech lattices, and syntax-based enhancements for the hierarchical systems. We also tried the combination of the Arabic-to-English and Chinese-to-English outputs as an additional submission.

2007

pdf abs
The RWTH machine translation system for IWSLT 2007
Arne Mauser | David Vilar | Gregor Leusch | Yuqi Zhang | Hermann Ney
Proceedings of the Fourth International Workshop on Spoken Language Translation

The RWTH system for the IWSLT 2007 evaluation is a combination of several statistical machine translation systems. The combination includes Phrase-Based models, a n-gram translation model and a hierarchical phrase model. We describe the individual systems and the method that was used for combining the system outputs. Compared to our 2006 system, we newly introduce a hierarchical phrase-based translation model and show improvements in system combination for Machine Translation. RWTH participated in the Italian-to-English and Chinese-to-English translation directions.

2006

pdf
The RWTH statistical machine translation system for the IWSLT 2006 evaluation
Arne Mauser | Richard Zens | Evgeny Matusov | Sasa Hasan | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Automatic sentence segmentation and punctuation prediction for spoken language translation
Evgeny Matusov | Arne Mauser | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf abs
Training a Statistical Machine Translation System without GIZA++
Arne Mauser | Evgeny Matusov | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The IBM Models (Brown et al., 1993) enjoy great popularity in the machine translation community because they offer high quality word alignments and a free implementation is available with the GIZA++ Toolkit (Och and Ney, 2003). Several methods have been developed to overcome the asymmetry of the alignment generated by the IBM Models. A remaining disadvantage, however, is the high model complexity. This paper describes a word alignment training procedure for statistical machine translation that uses a simple and clear statistical model, different from the IBM models. The main idea of the algorithm is to generate a symmetric and monotonic alignment between the target sentence and a permutation graph representing different reorderings of the words in the source sentence. The quality of the generated alignment is shown to be comparable to the standard GIZA++ training in an SMT setup.

2005

Cet article présente une méthode de traduction automatique statistique basée sur des segments non-continus, c’est-à-dire des segments formés de mots qui ne se présentent pas nécéssairement de façon contiguë dans le texte. On propose une méthode pour produire de tels segments à partir de corpus alignés au niveau des mots. On présente également un modèle de traduction statistique capable de tenir compte de tels segments, de même qu’une méthode d’apprentissage des paramètres du modèle visant à maximiser l’exactitude des traductions produites, telle que mesurée avec la métrique NIST. Les traductions optimales sont produites par le biais d’une recherche en faisceau. On présente finalement des résultats expérimentaux, qui démontrent comment la méthode proposée permet une meilleure généralisation à partir des données d’entraînement.