Hermann Ney

Also published as: H. Ney

2023

pdf abs
Improving Language Model Integration for Neural Machine Translation
Christian Herold | Yingbo Gao | Mohammad Zeineldeen | Hermann Ney
Findings of the Association for Computational Linguistics: ACL 2023

The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can help improve translation quality. However, there has always been the assumption that the translation model also learns an implicit target-side language model during training, which interferes with the external language model at decoding time. Recently, some works on automatic speech recognition have demonstrated that, if the implicit language model is neutralized in decoding, further improvements can be gained when integrating an external language model. In this work, we transfer this concept to the task of machine translation and compare with the most prominent way of including additional monolingual data - namely back-translation. We find that accounting for the implicit language model significantly boosts the performance of language model fusion, although this approach is still outperformed by back-translation.

pdf abs
On Search Strategies for Document-Level Neural Machine Translation
Christian Herold | Hermann Ney
Findings of the Association for Computational Linguistics: ACL 2023

Compared to sentence-level systems, document-level neural machine translation (NMT) models produce a more consistent output across a document and are able to better resolve ambiguities within the input. There are many works on document-level NMT, mostly focusing on modifying the model architecture or training strategy to better accommodate the additional context-input. On the other hand, in most works, the question on how to perform search with the trained model is scarcely discussed, sometimes not mentioned at all. In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding. We start with the most popular document-level NMT approach and compare different decoding schemes, some from the literature and others proposed by us. In the comparison, we are using both, standard automatic metrics, as well as specific linguistic phenomena on three standard document-level translation benchmarks. We find that most commonly used decoding strategies perform similar to each other and that higher quality context information has the potential to further improve the translation.

pdf abs
Document-Level Language Models for Machine Translation
Frithjof Petrick | Christian Herold | Pavel Petrushkov | Shahram Khadivi | Hermann Ney
Proceedings of the Eighth Conference on Machine Translation

Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores significantly and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination.

pdf abs
Improving Long Context Document-Level Machine Translation
Christian Herold | Hermann Ney
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many works have been published on the topic of document-level NMT, but most restrict the system to only local context, typically including just the one or two preceding sentences as additional information. This might be enough to resolve some ambiguous inputs, but it is probably not sufficient to capture some document-level information like the topic or style of a conversation. When increasing the context size beyond just the local context, there are two challenges: (i) the memory usage increases exponentially (ii) the translation performance starts to degrade. We argue that the widely-used attention mechanism is responsible for both issues. Therefore, we propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption. For evaluation, we utilize targeted test sets in combination with novel evaluation techniques to analyze the translations in regards to specific discourse-related phenomena. We find that our approach is a good compromise between sentence-level NMT vs attending to the full context, especially in low resource scenarios.

2022

pdf abs
Does Joint Training Really Help Cascaded Speech Translation?
Viet Anh Khoa Tran | David Thulke | Yingbo Gao | Christian Herold | Hermann Ney
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results.However, fundamental challenges such as error propagation from the automatic speech recognition system still remain.To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods.In this work, we seek to answer the question of whether joint training really helps cascaded speech translation.We review recent papers on the topic and also investigate a joint training criterion by marginalizing the transcription posterior probabilities.Our findings show that a strong cascaded baseline can diminish any improvements obtained using joint training, and we suggest alternatives to joint training.We hope this work can serve as a refresher of the current speech translation landscape, and motivate research in finding more efficient and creative ways to utilize the direct data for speech translation.

pdf abs
Detecting Various Types of Noise for Neural Machine Translation
Christian Herold | Jan Rosendahl | Joris Vanvinckenroye | Hermann Ney
Findings of the Association for Computational Linguistics: ACL 2022

The filtering and/or selection of training data is one of the core aspects to be considered when building a strong machine translation system. In their influential work, Khayrallah and Koehn (2018) investigated the impact of different types of noise on the performance of machine translation systems. In the same year the WMT introduced a shared task on parallel corpus filtering, which went on to be repeated in the following years, and resulted in many different filtering approaches being proposed. In this work we aim to combine the recent achievements in data filtering with the original analysis of Khayrallah and Koehn (2018) and investigate whether state-of-the-art filtering systems are capable of removing all the suggested noise types. We observe that most of these types of noise can be detected with an accuracy of over 90% by modern filtering systems when operating in a well studied high resource setting. However, we also find that when confronted with more refined noise categories or when working with a less common language pair, the performance of the filtering systems is far from optimal, showing that there is still room for improvement in this area of research.

pdf abs
Revisiting Checkpoint Averaging for Neural Machine Translation
Yingbo Gao | Christian Herold | Zijian Yang | Hermann Ney
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several checkpoints, the selection of which is mostly based on empirical recipes without many justifications. In this work, we revisit the concept of checkpoint averaging and consider several extensions. Specifically, we experiment with ideas such as using different checkpoint selection strategies, calculating weighted average instead of simple mean, making use of gradient information and fine-tuning the interpolation weights on development data. Our results confirm the necessity of applying checkpoint averaging for optimal performance, but also suggest that the landscape between the converged checkpoints is rather flat and not much further improvement compared to simple averaging is to be obtained.

pdf abs
Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model
Nico Daheim | David Thulke | Christian Dugast | Hermann Ney
Findings of the Association for Computational Linguistics: EMNLP 2022

In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes’ theorem.One component is a traditional ungrounded response generation model and the other component models the reconstruction of the grounding document based on the dialog context and generated response.We propose different approximate decoding schemes and evaluate our approach on multiple open-domain and task-oriented document-grounded dialog datasets.Our experiments show that the model is more factual in terms of automatic factuality metrics than the baseline model.Furthermore, we outline how introducing scaling factors between the components allows for controlling the tradeoff between factuality and fluency in the model output.Finally, we compare our approach to a recently proposed method to control factuality in grounded dialog, CTRL (Rashkin et al., 2021), and show that both approaches can be combined to achieve additional improvements.

pdf abs
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao | David Thulke | Sanjika Hewavitharana | Hermann Ney | Christof Monz
Findings of the Association for Computational Linguistics: EMNLP 2022

The pre-training of masked language models (MLMs) consumes massive computation to achieve good results on downstream NLP tasks, resulting in a large carbon footprint. In the vanilla MLM, the virtual tokens, [MASK]s, act as placeholders and gather the contextualized information from unmasked tokens to restore the corrupted information. It raises the question of whether we can append [MASK]s at a later layer, to reduce the sequence length for earlier layers and make the pre-training more efficient. We show: (1) [MASK]s can indeed be appended at a later layer, being disentangled from the word embedding; (2) The gathering of contextualized information from unmasked tokens can be conducted with a few layers. By further increasing the masking rate from 15% to 50%, we can pre-train RoBERTa-base and RoBERTa-large from scratch with only 78% and 68% of the original computational budget without any degradation on the GLUE benchmark. When pre-training with the original budget, our method outperforms RoBERTa for 6 out of 8 GLUE tasks, on average by 0.4%.

pdf abs
Is Encoder-Decoder Redundant for Neural Machine Translation?
Yingbo Gao | Christian Herold | Zijian Yang | Hermann Ney
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.

pdf abs
Locality-Sensitive Hashing for Long Context Neural Machine Translation
Frithjof Petrick | Jan Rosendahl | Christian Herold | Hermann Ney
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

After its introduction the Transformer architecture quickly became the gold standard for the task of neural machine translation. A major advantage of the Transformer compared to previous architectures is the faster training speed achieved by complete parallelization across timesteps due to the use of attention over recurrent layers. However, this also leads to one of the biggest problems of the Transformer, namely the quadratic time and memory complexity with respect to the input length. In this work we adapt the locality-sensitive hashing approach of Kitaev et al. (2020) to self-attention in the Transformer, we extended it to cross-attention and apply this memory efficient framework to sentence- and document-level machine translation. Our experiments show that the LSH attention scheme for sentence-level comes at the cost of slightly reduced translation quality. For document-level NMT we are able to include much bigger context sizes than what is possible with the baseline Transformer. However, more context does neither improve translation quality nor improve scores on targeted test suites.

2021

pdf bib abs
Investigation on Data Adaptation Techniques for Neural Named Entity Recognition
Evgeniia Tokarchuk | David Thulke | Weiyue Wang | Christian Dugast | Hermann Ney
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Data processing is an important step in various natural language processing tasks. As the commonly used datasets in named entity recognition contain only a limited number of samples, it is important to obtain additional labeled data in an efficient and reliable manner. A common practice is to utilize large monolingual unlabeled corpora. Another popular technique is to create synthetic data from the original labeled data (data augmentation). In this work, we investigate the impact of these two methods on the performance of three different named entity recognition tasks.

pdf abs
Transformer-Based Direct Hidden Markov Model for Machine Translation
Weiyue Wang | Zijian Yang | Yingbo Gao | Hermann Ney
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

The neural hidden Markov model has been proposed as an alternative to attention mechanism in machine translation with recurrent neural networks. However, since the introduction of the transformer models, its performance has been surpassed. This work proposes to introduce the concept of the hidden Markov model to the transformer architecture, which outperforms the transformer baseline. Interestingly, we find that the zero-order model already provides promising performance, giving it an edge compared to a model with first-order dependency, which performs similarly but is significantly slower in training and decoding.

pdf abs
Data Filtering using Cross-Lingual Word Embeddings
Christian Herold | Jan Rosendahl | Joris Vanvinckenroye | Hermann Ney
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Data filtering for machine translation (MT) describes the task of selecting a subset of a given, possibly noisy corpus with the aim to maximize the performance of an MT system trained on this selected data. Over the years, many different filtering approaches have been proposed. However, varying task definitions and data conditions make it difficult to draw a meaningful comparison. In the present work, we aim for a more systematic approach to the task at hand. First, we analyze the performance of language identification, a tool commonly used for data filtering in the MT community and identify specific weaknesses. Based on our findings, we then propose several novel methods for data filtering, based on cross-lingual word embeddings. We compare our approaches to one of the winning methods from the WMT 2018 shared task on parallel corpus filtering on three real-life, high resource MT tasks. We find that said method, which was performing very strong in the WMT shared task, does not perform well within our more realistic task conditions. While we find that our approaches come out at the top on all three tasks, different variants perform best on different tasks. Further experiments on the WMT 2020 shared task for parallel corpus filtering show that our methods achieve comparable results to the strongest submissions of this campaign.

pdf abs
Recurrent Attention for the Transformer
Jan Rosendahl | Christian Herold | Frithjof Petrick | Hermann Ney
Proceedings of the Second Workshop on Insights from Negative Results in NLP

In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoder-decoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.

pdf abs
Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
Evgeniia Tokarchuk | Jan Rosendahl | Weiyue Wang | Pavel Petrushkov | Tomer Lancewicki | Shahram Khadivi | Hermann Ney
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However,cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used. Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on(synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French→German and German→Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French→German over the cascaded baseline.

pdf abs
Cascaded Span Extraction and Response Generation for Document-Grounded Dialog
Nico Daheim | David Thulke | Christian Dugast | Hermann Ney
Proceedings of the 1st Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc 2021)

This paper summarizes our entries to both subtasks of the first DialDoc shared task which focuses on the agent response prediction task in goal-oriented document-grounded dialogs. The task is split into two subtasks: predicting a span in a document that grounds an agent turn and generating an agent response based on a dialog and grounding document. In the first subtask, we restrict the set of valid spans to the ones defined in the dataset, use a biaffine classifier to model spans, and finally use an ensemble of different models. For the second sub-task, we use a cascaded model which grounds the response prediction on the predicted span instead of the full document. With these approaches, we obtain significant improvements in both subtasks compared to the baseline.

2020

pdf abs
Unifying Input and Output Smoothing in Neural Machine Translation
Yingbo Gao | Baohao Liao | Hermann Ney
Proceedings of the 28th International Conference on Computational Linguistics

Soft contextualized data augmentation is a recent method that replaces one-hot representation of words with soft posterior distributions of an external language model, smoothing the input of neural machine translation systems. Label smoothing is another effective method that penalizes over-confident model outputs by discounting some probability mass from the true target word, smoothing the output of neural machine translation systems. Having the benefit of updating all word vectors in each optimization step and better regularizing the models, the two smoothing methods are shown to bring significant improvements in translation performance. In this work, we study how to best combine the methods and stack the improvements. Specifically, we vary the prior distributions to smooth with, the hyperparameters that control the smoothing strength, and the token selection procedures. We conduct extensive experiments on small datasets, evaluate the recipes on larger datasets, and examine the implications when back-translation is further used. Our results confirm cumulative improvements when input and output smoothing are used in combination, giving up to +1.9 BLEU scores on standard machine translation tasks and reveal reasons why these smoothing methods should be preferred.

pdf abs
Neural Language Modeling for Named Entity Recognition
Zhihong Lei | Weiyue Wang | Christian Dugast | Hermann Ney
Proceedings of the 28th International Conference on Computational Linguistics

Named entity recognition is a key component in various natural language processing systems, and neural architectures provide significant improvements over conventional approaches. Regardless of different word embedding and hidden layer structures of the networks, a conditional random field layer is commonly used for the output. This work proposes to use a neural language model as an alternative to the conditional random field layer, which is more flexible for the size of the corpus. Experimental results show that the proposed system has a significant advantage in terms of training speed, with a marginal performance degradation.

pdf abs
Towards a Better Understanding of Label Smoothing in Neural Machine Translation
Yingbo Gao | Weiyue Wang | Christian Herold | Zijian Yang | Hermann Ney
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems. The core idea is to penalize over-confident outputs and regularize the model so that its outputs do not diverge too much from some prior distribution. While training perplexity generally gets worse, label smoothing is found to consistently improve test performance. In this work, we aim to better understand label smoothing in the context of neural machine translation. Theoretically, we derive and explain exactly what label smoothing is optimizing for. Practically, we conduct extensive experiments by varying which tokens to smooth, tuning the probability mass to be deducted from the true targets and considering different prior distributions. We show that label smoothing is theoretically well-motivated, and by carefully choosing hyperparameters, the practical performance of strong neural machine translation systems can be further improved.

pdf abs
Predicting and Using Target Length in Neural Machine Translation
Zijian Yang | Yingbo Gao | Weiyue Wang | Hermann Ney
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Attention-based encoder-decoder models have achieved great success in neural machine translation tasks. However, the lengths of the target sequences are not explicitly predicted in these models. This work proposes length prediction as an auxiliary task and set up a sub-network to obtain the length information from the encoder. Experimental results show that the length prediction sub-network brings improvements over the strong baseline system and that the predicted length can be used as an alternative to length normalization during decoding.

Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts.

pdf abs
Towards a Better Evaluation of Metrics for Machine Translation
Peter Stanchev | Weiyue Wang | Hermann Ney
Proceedings of the Fifth Conference on Machine Translation

An important aspect of machine translation is its evaluation, which can be achieved through the use of a variety of metrics. To compare these metrics, the workshop on statistical machine translation annually evaluates metrics based on their correlation with human judgement. Over the years, methods for measuring correlation with humans have changed, but little research has been performed on what the optimal methods for acquiring human scores are and how human correlation can be measured. In this work, the methods for evaluating metrics at both system- and segment-level are analyzed in detail and their shortcomings are pointed out.

pdf bib
Investigation of Transformer-based Latent Attention Models for Neural Machine Translation
Parnia Bahar | Nikita Makarov | Hermann Ney
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf abs
Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture
Christopher Brix | Parnia Bahar | Hermann Ney
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English-to-German and English-to-French tasks, we show that stabilized lottery ticket pruning performs similar to magnitude pruning for sparsity levels of up to 85%, and propose a new combination of pruning techniques that outperforms all other techniques for even higher levels of sparsity. Furthermore, we confirm that the parameter’s initial sign and not its specific value is the primary factor for successful training, and show that magnitude pruning cannot be used to find winning lottery tickets.

pdf abs
When and Why is Unsupervised Neural Machine Translation Useless?
Yunsu Kim | Miguel Graça | Hermann Ney
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper studies the practicality of the current state-of-the-art unsupervised methods in neural machine translation (NMT). In ten translation tasks with various data settings, we analyze the conditions under which the unsupervised methods fail to produce reasonable translations. We show that their performance is severely affected by linguistic dissimilarity and domain mismatch between source and target monolingual data. Such conditions are common for low-resource language pairs, where unsupervised learning works poorly. In all of our experiments, supervised and semi-supervised baselines with 50k-sentence bilingual data outperform the best unsupervised results. Our analyses pinpoint the limits of the current unsupervised NMT and also suggest immediate research directions.

pdf abs
Multi-Agent Mutual Learning at Sentence-Level and Token-Level for Neural Machine Translation
Baohao Liao | Yingbo Gao | Hermann Ney
Findings of the Association for Computational Linguistics: EMNLP 2020

Mutual learning, where multiple agents learn collaboratively and teach one another, has been shown to be an effective way to distill knowledge for image classification tasks. In this paper, we extend mutual learning to the machine translation task and operate at both the sentence-level and the token-level. Firstly, we co-train multiple agents by using the same parallel corpora. After convergence, each agent selects and learns its poorly predicted tokens from other agents. The poorly predicted tokens are determined by the acceptance-rejection sampling algorithm. Our experiments show that sequential mutual learning at the sentence-level and the token-level improves the results cumulatively. Absolute improvements compared to strong baselines are obtained on various translation tasks. On the IWSLT’14 German-English task, we get a new state-of-the-art BLEU score of 37.0. We also report a competitive result, 29.9 BLEU score, on the WMT’14 English-German task.

2019

pdf abs
Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies
Yunsu Kim | Yingbo Gao | Hermann Ney
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transfer learning or multilingual model is essential for low-resource neural machine translation (NMT), but the applicability is limited to cognate languages by sharing their vocabularies. This paper shows effective techniques to transfer a pretrained NMT model to a new, unrelated language without shared vocabularies. We relieve the vocabulary mismatch by using cross-lingual word embedding, train a more language-agnostic encoder by injecting artificial noises, and generate synthetic data easily from the pretraining data without back-translation. Our methods do not require restructuring the vocabulary or retraining the model. We improve plain NMT transfer by up to +5.1% BLEU in five low-resource translation tasks, outperforming multilingual joint training by a large margin. We also provide extensive ablation studies on pretrained embedding, synthetic data, vocabulary size, and parameter freezing for a better understanding of NMT transfer.

pdf abs
Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages
Yunsu Kim | Petre Petrov | Pavel Petrushkov | Shahram Khadivi | Hermann Ney
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios.

pdf abs
uniblock: Scoring and Filtering Corpus with Unicode Block Information
Yingbo Gao | Weiyue Wang | Hermann Ney
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The preprocessing pipelines in Natural Language Processing usually involve a step of removing sentences consisted of illegal characters. The definition of illegal characters and the specific removal strategy depend on the task, language, domain, etc, which often lead to tiresome and repetitive scripting of rules. In this paper, we introduce a simple statistical method, uniblock, to overcome this problem. For each sentence, uniblock generates a fixed-size feature vector using Unicode block information of the characters. A Gaussian mixture model is then estimated on some clean corpus using variational inference. The learned model can then be used to score sentences and filter corpus. We present experimental results on Sentiment Analysis, Language Modeling and Machine Translation, and show the simplicity and effectiveness of our method.

pdf abs
When and Why is Document-level Context Useful in Neural Machine Translation?
Yunsu Kim | Duc Thanh Tran | Hermann Ney
Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019)

Document-level context has received lots of attention for compensating neural machine translation (NMT) of isolated sentences. However, recent advances in document-level NMT focus on sophisticated integration of the context, explaining its improvement with only a few selected examples or targeted test sets. We extensively quantify the causes of improvements by a document-level model in general test sets, clarifying the limit of the usefulness of document-level context in NMT. We show that most of the improvements are not interpretable as utilizing the context. We also show that a minimal encoding is sufficient for the context modeling and very long context is not helpful for NMT.

pdf abs
Analysis of Positional Encodings for Neural Machine Translation
Jan Rosendahl | Viet Anh Khoa Tran | Weiyue Wang | Hermann Ney
Proceedings of the 16th International Conference on Spoken Language Translation

In this work we analyze and compare the behavior of the Transformer architecture when using different positional encoding methods. While absolute and relative positional encoding perform equally strong overall, we show that relative positional encoding is vastly superior (4.4% to 11.9% BLEU) when translating a sentence that is longer than any observed training sentence. We further propose and analyze variations of relative positional encoding and observe that the number of trainable parameters can be reduced without a performance loss, by using fixed encoding vectors or by removing some of the positional encoding vectors.

pdf abs
On Using SpecAugment for End-to-End Speech Translation
Parnia Bahar | Albert Zeyer | Ralf Schlüter | Hermann Ney
Proceedings of the 16th International Conference on Spoken Language Translation

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

pdf abs
Exploring Kernel Functions in the Softmax Layer for Contextual Word Classification
Yingbo Gao | Christian Herold | Weiyue Wang | Hermann Ney
Proceedings of the 16th International Conference on Spoken Language Translation

Prominently used in support vector machines and logistic re-gressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities.

pdf abs
Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron
Yunsu Kim | Hendrik Rosendahl | Nick Rossenbach | Jan Rosendahl | Shahram Khadivi | Hermann Ney
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.

pdf abs
Generalizing Back-Translation in Neural Machine Translation
Miguel Graça | Yunsu Kim | Julian Schamper | Shahram Khadivi | Hermann Ney
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

Back-translation — data augmentation by translating target monolingual data — is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data generation schemes, including sampling from a target-to-source NMT model. With this formulation, we point out fundamental problems of the sampling-based approaches and propose to remedy them by (i) disabling label smoothing for the target-to-source model and (ii) sampling from a restricted search space. Our statements are investigated on the WMT 2018 German <-> English news translation task.

This paper describes the neural machine translation systems developed at the RWTH Aachen University for the German-English, Chinese-English and Kazakh-English news translation tasks of the Fourth Conference on Machine Translation (WMT19). For all tasks, the final submitted system is based on the Transformer architecture. We focus on improving data filtering and fine-tuning as well as systematically evaluating interesting approaches like unigram language model segmentation and transfer learning. For the De-En task, none of the tested methods gave a significant improvement over last years winning system and we end up with the same performance, resulting in 39.6% BLEU on newstest2019. In the Zh-En task, we show 1.3% BLEU improvement over our last year’s submission, which we mostly attribute to the splitting of long sentences during translation. We further report results on the Kazakh-English task where we gain improvements of 11.1% BLEU over our baseline system. On the same task we present a recent transfer learning approach, which uses half of the free parameters of our submission system and performs on par with it.

pdf abs
EED: Extended Edit Distance Measure for Machine Translation
Peter Stanchev | Weiyue Wang | Hermann Ney
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

Over the years a number of machine translation metrics have been developed in order to evaluate the accuracy and quality of machine-generated translations. Metrics such as BLEU and TER have been used for decades. However, with the rapid progress of machine translation systems, the need for better metrics is growing. This paper proposes an extension of the edit distance, which achieves better human correlation, whilst remaining fast, flexible and easy to understand.

2018

pdf abs
Neural Hidden Markov Model for Machine Translation
Weiyue Wang | Derui Zhu | Tamer Alkhouli | Zixuan Gan | Hermann Ney
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Attention-based neural machine translation (NMT) models selectively focus on specific source positions to produce a translation, which brings significant improvements over pure encoder-decoder sequence-to-sequence models. This work investigates NMT while replacing the attention component. We study a neural hidden Markov model (HMM) consisting of neural network-based alignment and lexicon models, which are trained jointly using the forward-backward algorithm. We show that the attention component can be effectively replaced by the neural network alignment model and the neural HMM approach is able to provide comparable performance with the state-of-the-art attention-based models on the WMT 2017 German↔English and Chinese→English translation tasks.

pdf abs
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Albert Zeyer | Tamer Alkhouli | Hermann Ney
Proceedings of ACL 2018, System Demonstrations

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.

pdf abs
Improving Neural Language Models with Weight Norm Initialization and Regularization
Christian Herold | Yingbo Gao | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Research Papers

Embedding and projection matrices are commonly used in neural language models (NLM) as well as in other sequence processing networks that operate on large vocabularies. We examine such matrices in fine-tuned language models and observe that a NLM learns word vectors whose norms are related to the word frequencies. We show that by initializing the weight norms with scaled log word counts, together with other techniques, lower perplexities can be obtained in early epochs of training. We also introduce a weight norm regularization loss term, whose hyperparameters are tuned via a grid search. With this method, we are able to significantly improve perplexities on two word-level language modeling tasks (without dynamic evaluation): from 54.44 to 53.16 on Penn Treebank (PTB) and from 61.45 to 60.13 on WikiText-2 (WT2).

pdf abs
On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation
Tamer Alkhouli | Gabriel Bretschner | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Research Papers

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to 3.8% BLEU improvement when using the dictionary, in comparison to 2.4% BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of 1.8 without loss in translation performance. We carry out experiments on the shared WMT 2016 English→Romanian news task and the BOLT Chinese→English discussion forum task.

pdf abs
The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018
Miguel Graça | Yunsu Kim | Julian Schamper | Jiahui Geng | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the unsupervised neural machine translation (NMT) systems of the RWTH Aachen University developed for the English ↔ German news translation task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). Our work is based on iterative back-translation using a shared encoder-decoder NMT model. We extensively compare different vocabulary types, word embedding initialization schemes and optimization methods for our model. We also investigate gating and weight normalization for the word embedding layer.

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8% BLEU over our last year’s submission and by 4.8% BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6% BLEU improvement over the last year’s winning system. We further report results on the Chinese→English task where we improve 2.2% BLEU on average over our baseline systems but stay behind the 2018 winning systems.

pdf abs
The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task
Nick Rossenbach | Jan Rosendahl | Yunsu Kim | Miguel Graça | Aman Gokrani | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission of RWTH Aachen University for the De→En parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU.

pdf abs
Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder
Yunsu Kim | Jiahui Geng | Hermann Ney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a language model for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

pdf abs
Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation
Parnia Bahar | Christopher Brix | Hermann Ney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modelling. In the state-of-the-art methods, source and target sentences are treated as one-dimensional sequences over time, while we view translation as a two-dimensional (2D) mapping using an MDLSTM layer to define the correspondence between source and target words. We extend beyond the current sequence to sequence backbone NMT models to a 2D structure in which the source and target sentences are aligned with each other in a 2D grid. Our proposed topology shows consistent improvements over attention-based sequence to sequence model on two WMT 2017 tasks, German<->English.

pdf abs
Sisyphus, a Workflow Manager Designed for Machine Translation and Automatic Speech Recognition
Jan-Thorsten Peter | Eugen Beck | Hermann Ney
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Training and testing many possible parameters or model architectures of state-of-the-art machine translation or automatic speech recognition system is a cumbersome task. They usually require a long pipeline of commands reaching from pre-processing the training data to post-processing and evaluating the output.

2017

pdf abs
Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes
Yunsu Kim | Julian Schamper | Hermann Ney
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

pdf abs
The RWTH Aachen Machine Translation Systems for IWSLT 2017
Parnia Bahar | Jan Rosendahl | Nick Rossenbach | Hermann Ney
Proceedings of the 14th International Conference on Spoken Language Translation

This work describes the Neural Machine Translation (NMT) system of the RWTH Aachen University developed for the English$German tracks of the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2017. We use NMT systems which are augmented by state-of-the-art extensions. Furthermore, we experiment with techniques that include data filtering, a larger vocabulary, two extensions to the attention mechanism and domain adaptation. Using these methods, we can show considerable improvements over the respective baseline systems and our IWSLT 2016 submission.

pdf
Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information
Tamer Alkhouli | Hermann Ney
Proceedings of the Second Conference on Machine Translation

pdf abs
Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation
Weiyue Wang | Tamer Alkhouli | Derui Zhu | Hermann Ney
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recently, the neural machine translation systems showed their promising performance and surpassed the phrase-based systems for most translation tasks. Retreating into conventional concepts machine translation while utilizing effective neural models is vital for comprehending the leap accomplished by neural machine translation over phrase-based methods. This work proposes a direct HMM with neural network-based lexicon and alignment models, which are trained jointly using the Baum-Welch algorithm. The direct HMM is applied to rerank the n-best list created by a state-of-the-art phrase-based translation system and it provides improvements by up to 1.0% Bleu scores on two different translation tasks.

2016

pdf
A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing
Yunsu Kim | Andreas Guta | Joern Wuebker | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf
The RWTH Aachen University English-Romanian Machine Translation System for WMT 2016
Jan-Thorsten Peter | Tamer Alkhouli | Andreas Guta | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
CharacTer: Translation Edit Rate on Character Level
Weiyue Wang | Jan-Thorsten Peter | Hendrik Rosendahl | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
Exponentially Decaying Bag-of-Words Input Features for Feed-Forward Neural Network in Statistical Machine Translation
Jan-Thorsten Peter | Weiyue Wang | Hermann Ney
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf abs
The RWTH Aachen LVCSR system for IWSLT-2016 German Skype conversation recognition task
Wilfried Michel | Zoltán Tüske | M. Ali Basha Shaik | Ralf Schlüter | Hermann Ney
Proceedings of the 13th International Conference on Spoken Language Translation

In this paper the RWTH large vocabulary continuous speech recognition (LVCSR) systems developed for the IWSLT-2016 evaluation campaign are described. This evaluation campaign focuses on transcribing spontaneous speech from Skype recordings. State-of-the-art bidirectional long short-term memory (LSTM) and deep, multilingually boosted feed-forward neural network (FFNN) acoustic models are trained an narrow and broadband features. An open vocabulary approach using subword units is also considered. LSTM and count-based full word and hybrid backoff language modeling methods are used to model the morphological richness of the German language. All these approaches are combined using confusion network combination (CNC) to yield a competitive WER.

pdf abs
The RWTH Aachen Machine Translation System for IWSLT 2016
Jan-Thorsten Peter | Andreas Guta | Nick Rossenbach | Miguel Graça | Hermann Ney
Proceedings of the 13th International Conference on Spoken Language Translation

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of International Workshop on Spoken Language Translation (IWSLT) 2016. We have participated in the MT track for the German→English language pair employing our state-of-the-art phrase-based system, neural machine translation implementation and our joint translation and reordering decoder. Furthermore, we have applied feed-forward and recurrent neural language and translation models for reranking. The attention-based approach has been used for reranking the n-best lists for both phrasebased and hierarchical setups. On top of these systems, we make use of system combination to enhance the translation quality by combining individually trained systems.

2015

pdf
A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training
Joern Wuebker | Sebastian Muehr | Patrick Lehnen | Stephan Peitz | Hermann Ney
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences
Andreas Guta | Tamer Alkhouli | Jan-Thorsten Peter | Joern Wuebker | Hermann Ney
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
The RWTH Aachen German-English Machine Translation System for WMT 2015
Jan-Thorsten Peter | Farzad Toutounchi | Joern Wuebker | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Extended Translation Models in Phrase-based Decoding
Andreas Guta | Joern Wuebker | Miguel Graça | Yunsu Kim | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models
Tamer Alkhouli | Felix Rietig | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Local System Voting Feature for Machine Translation System Combination
Markus Freitag | Jan-Thorsten Peter | Stephan Peitz | Minwei Feng | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
UNRAVEL—A Decipherment Toolkit
Malte Nuhn | Julian Schamper | Hermann Ney
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
The RWTH Aachen machine translation system for IWSLT 2015
Jan-Thorsten Peter | Farzad Toutounchi | Stephan Peitz | Parnia Bahar | Andreas Guta | Hermann Ney
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

2014

EU-BRIDGE is a European research project which is aimed at developing innovative speech translation technology. One of the collaborative efforts within EU-BRIDGE is to produce joint submissions of up to four different partners to the evaluation campaign at the 2014 International Workshop on Spoken Language Translation (IWSLT). We submitted combined translations to the German→English spoken language translation (SLT) track as well as to the German→English, English→German and English→French machine translation (MT) tracks. In this paper, we present the techniques which were applied by the different individual translation systems of RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show the combination approach developed at RWTH Aachen University which combined the individual systems. The consensus translations yield empirical gains of up to 2.3 points in BLEU and 1.2 points in TER compared to the best individual system.

pdf abs
The RWTH Aachen machine translation systems for IWSLT 2014
Joern Wuebker | Stephan Peitz | Andreas Guta | Hermann Ney
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2014. We participated in both the MT and SLT tracks for the English→French and German→English language pairs and applied the identical training pipeline and models on both language pairs. Our state-of-the-art phrase-based baseline systems are augmented with maximum expected BLEU training for phrasal, lexical and reordering models. Further, we apply rescoring with novel recurrent neural language and translation models. The same systems are used for the SLT track, where we additionally perform punctuation prediction on the automatic transcriptions employing hierarchical phrase-based translation. We are able to improve RWTH’s 2013 evaluation systems by 1.7-1.8% BLEU absolute.

pdf abs
Better punctuation prediction with hierarchical phrase-based translation
Stephan Peitz | Markus Freitag | Hermann Ney
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Punctuation prediction is an important task in spoken language translation and can be performed by using a monolingual phrase-based translation system to translate from unpunctuated to text with punctuation. However, a punctuation prediction system based on phrase-based translation is not able to capture long-range dependencies between words and punctuation marks. In this paper, we propose to employ hierarchical translation in place of phrase-based translation and show that this approach is more robust for unseen word sequences. Furthermore, we analyze different optimization criteria for tuning the scaling factors of a monolingual statistical machine translation system. In our experiments, we compare the new approach with other punctuation prediction methods and show improvements in terms of F1-Score and BLEU on the IWSLT 2014 German→English and English→French translation tasks.

pdf abs
Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather
Jens Forster | Christoph Schmidt | Oscar Koller | Martin Bellgardt | Hermann Ney
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces the RWTH-PHOENIX-Weather 2014, a video-based, large vocabulary, German sign language corpus which has been extended over the last two years, tripling the size of the original corpus. The corpus contains weather forecasts simultaneously interpreted into sign language which were recorded from German public TV and manually annotated using glosses on the sentence level and semi-automatically transcribed spoken German extracted from the videos using the open-source speech recognition system RASR. Spatial annotations of the signers’ hands as well as shape and orientation annotations of the dominant hand have been added for more than 40k respectively 10k video frames creating one of the largest corpora allowing for quantitative evaluation of object tracking algorithms. Further, over 2k signs have been annotated using the SignWriting annotation system, focusing on the shape, orientation, movement as well as spatial contacts of both hands. Finally, extended recognition and translation setups are defined, and baseline results are presented.

pdf
German Compounds and Statistical Machine Translation. Can they get along?
Carla Parra Escartín | Stephan Peitz | Hermann Ney
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf
The RWTH Aachen German-English Machine Translation System for WMT 2014
Stephan Peitz | Joern Wuebker | Markus Freitag | Hermann Ney
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
Unsupervised Adaptation for Statistical Machine Translation
Saab Mansour | Hermann Ney
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Vector Space Models for Phrase-based Machine Translation
Tamer Alkhouli | Andreas Guta | Hermann Ney
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
Translation Modeling with Bidirectional Recurrent Neural Networks
Martin Sundermeyer | Tamer Alkhouli | Joern Wuebker | Hermann Ney
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Improved Decipherment of Homophonic Ciphers
Malte Nuhn | Julian Schamper | Hermann Ney
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
EM Decipherment for Large Vocabularies
Malte Nuhn | Hermann Ney
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Jane: Open Source Machine Translation System Combination
Markus Freitag | Matthias Huck | Hermann Ney
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Simple and Effective Approach for Consistent Training of Hierarchical Phrase-based Translation Models
Stephan Peitz | David Vilar | Hermann Ney
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf
Translation model based weighting for phrase extraction
Saab Mansour | Hermann Ney
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

For the task of online translation of scientific video lectures, using huge models is not possible. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of translation and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-of-vocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths.

2013

pdf
Improving Statistical Machine Translation with Word Class Models
Joern Wuebker | Stephan Peitz | Felix Rietig | Hermann Ney
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Advancements in Reordering Models for Statistical Machine Translation
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Decipherment Complexity in 1:1 Substitution Ciphers
Malte Nuhn | Hermann Ney
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Beam Search for Solving Substitution Ciphers
Malte Nuhn | Julian Schamper | Hermann Ney
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Phrase Training Based Adaptation for Statistical Machine Translation
Saab Mansour | Hermann Ney
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
A Performance Study of Cube Pruning for Large-Scale Hierarchical Machine Translation
Matthias Huck | David Vilar | Markus Freitag | Hermann Ney
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
Length-Incremental Phrase Training for SMT
Joern Wuebker | Hermann Ney
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
A Phrase Orientation Model for Hierarchical Machine Translation
Matthias Huck | Joern Wuebker | Felix Rietig | Hermann Ney
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Statistical MT Systems Revisited: How much Hybridity do they have?
Hermann Ney
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf
Improving Continuous Sign Language Recognition: Speech Recognition Techniques and System Design
Jens Forster | Oscar Koller | Christian Oberdörfer | Yannick Gweth | Hermann Ney
Proceedings of the Fourth Workshop on Speech and Language Processing for Assistive Technologies

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2013. We participated in the English→French, English↔German, Arabic→English, Chinese→English and Slovenian↔English MT tracks and the English→French and English→German SLT tracks. We apply phrase-based and hierarchical SMT decoders, which are augmented by state-of-the-art extensions. The novel techniques we experimentally evaluate include discriminative phrase training, a continuous space language model, a hierarchical reordering model, a word class language model, domain adaptation via data selection and system combination of standard and reverse order models. By application of these methods we can show considerable improvements over the respective baseline systems.

In this paper, German and English large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 evaluation campaign are presented. Good improvements are obtained with state-of-the-art monolingual and multilingual bottleneck features. In addition, an open vocabulary approach using morphemic sub-lexical units is investigated along with the language model adaptation for the German LVCSR. For both the languages, competitive WERs are achieved using system combination.

EU-BRIDGE1 is a European research project which is aimed at developing innovative speech translation technology. This paper describes one of the collaborative efforts within EUBRIDGE to further advance the state of the art in machine translation between two European language pairs, English→French and German→English. Four research institutions involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the machine translation track of the evaluation campaign at the 2013 International Workshop on Spoken Language Translation (IWSLT). We present the methods and techniques to achieve high translation quality for text translation of talks which are applied at RWTH Aachen University, the University of Edinburgh, Karlsruhe Institute of Technology, and Fondazione Bruno Kessler. We then show how we have been able to considerably boost translation performance (as measured in terms of the metrics BLEU and TER) by means of system combination. The joint setups yield empirical gains of up to 1.4 points in BLEU and 2.8 points in TER on the IWSLT test sets compared to the best single systems.

pdf bib abs
Using viseme recognition to improve a sign language translation system
Christoph Schmidt | Oscar Koller | Hermann Ney | Thomas Hoyoux | Justus Piater
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

Sign language-to-text translation systems are similar to spoken language translation systems in that they consist of a recognition phase and a translation phase. First, the video of a person signing is transformed into a transcription of the signs, which is then translated into the text of a spoken language. One distinctive feature of sign languages is their multi-modal nature, as they can express meaning simultaneously via hand movements, body posture and facial expressions. In some sign languages, certain signs are accompanied by mouthings, i.e. the person silently pronounces the word while signing. In this work, we closely integrate a recognition and translation framework by adding a viseme recognizer (“lip reading system”) based on an active appearance model and by optimizing the recognition system to improve the translation output. The system outperforms the standard approach of separate recognition and translation.

pdf
(Hidden) Conditional Random Fields Using Intermediate Classes for Statistical Machine Translation
Patrick Lehnen | Jorn Wiibker Jan-Thorsten Peter | Stephan Peitz | Hermann Ney
Proceedings of Machine Translation Summit XIV: Papers

pdf
Reverse Word Order Model
Markus Freitag | Minwei Feng | Matthias Huck | Stephan Peitz | Hermann Ney
Proceedings of Machine Translation Summit XIV: Papers

pdf
SIGNSPEAK: Scientific Understanding and Vision-based Technological Development for Continuous Sign Language Recognition and Translation
Jens Forster | Christoph Schmidt | Hermann Ney
Proceedings of Machine Translation Summit XIV: European projects

2012

pdf
Insertion and Deletion Models for Statistical Machine Translation
Matthias Huck | Hermann Ney
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf abs
A simple and effective weighted phrase extraction for machine translation adaptation
Saab Mansour | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

The task of domain-adaptation attempts to exploit data mainly drawn from one domain (e.g. news) to maximize the performance on the test domain (e.g. weblogs). In previous work, weighting the training instances was used for filtering dissimilar data. We extend this by incorporating the weights directly into the standard phrase training procedure of statistical machine translation (SMT). This allows the SMT system to make the decision whether to use a phrase translation pair or not, a more methodological way than discarding phrase pairs completely when using filtering. Furthermore, we suggest a combined filtering and weighting procedure to achieve better results while reducing the phrase table size. The proposed methods are evaluated in the context of Arabicto-English translation on various conditions, where significant improvements are reported when using the suggested weighted phrase training. The weighting method also improves over filtering, and the combined filtering and weighting is better than a standalone filtering method. Finally, we experiment with mixture modeling, where additional improvements are reported when using weighted phrase extraction over a variety of baselines.

pdf abs
Sequence labeling-based reordering model for phrase-based SMT
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

For current statistical machine translation system, reordering is still a major problem for language pairs like Chinese-English, where the source and target language have significant word order differences. In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. For the given source sentence, we assign each source token a label which contains the reordering information for that token. We also design an unaligned word tag so that the unaligned word phenomenon is automatically implanted in the proposed model. Our reordering model is conditioned on the whole source sentence. Hence it is able to catch the long dependency in the source sentence. Although the learning on large scale task requests notably amounts of computational resources, the decoder makes use of the tagging information as soft constraints. Therefore, the training procedure of our model is computationally expensive for large task while in the test phase (during translation) our model is very efficient. We carried out experiments on five Chinese-English NIST tasks trained with BOLT data. Results show that our model improves the baseline system by 1.32 BLEU 1.53 TER on average.

pdf abs
Spoken language translation using automatically transcribed text in training
Stephan Peitz | Simon Wiesler | Markus Nußbaum-Thom | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

In spoken language translation a machine translation system takes speech as input and translates it into another language. A standard machine translation system is trained on written language data and expects written language as input. In this paper we propose an approach to close the gap between the output of automatic speech recognition and the input of machine translation by training the translation system on automatically transcribed speech. In our experiments we show improvements of up to 0.9 BLEU points on the IWSLT 2012 English-to-French speech translation task.

pdf
Deciphering Foreign Language by Combining Language Models and Context Vectors
Malte Nuhn | Arne Mauser | Hermann Ney
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Joern Wuebker | Hermann Ney | Richard Zens
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
The RWTH Aachen Machine Translation System for WMT 2012
Matthias Huck | Stephan Peitz | Markus Freitag | Malte Nuhn | Hermann Ney
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf
Phrase Model Training for Statistical Machine Translation with Word Lattices of Preprocessing Alternatives
Joern Wuebker | Hermann Ney
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf
A Tagging-style Reordering Model for Phrase-based SMT
Minwei Feng | Hermann Ney
Proceedings of the Workshop on Reordering for Statistical Machine Translation

pdf
Semantic Cohesion Model for Phrase-Based SMT
Minwei Feng | Weiwei Sun | Hermann Ney
Proceedings of COLING 2012

pdf
Forced Derivations for Hierarchical Machine Translation
Stephan Peitz | Arne Mauser | Joern Wuebker | Hermann Ney
Proceedings of COLING 2012: Posters

pdf abs
Arabic-Segmentation Combination Strategies for Statistical Machine Translation
Saab Mansour | Hermann Ney
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Arabic segmentation was already applied successfully for the task of statistical machine translation (SMT). Yet, there is no consistent comparison of the effect of different techniques and methods over the final translation quality. In this work, we use existing tools and further re-implement and develop new methods for segmentation. We compare the resulting SMT systems based on the different segmentation methods over the small IWSLT 2010 BTEC and the large NIST 2009 Arabic-to-English translation tasks. Our results show that for both small and large training data, segmentation yields strong improvements, but, the differences between the top ranked segmenters are statistically insignificant. Due to the different methodologies that we apply for segmentation, we expect a complimentary variation in the results achieved by each method. As done in previous work, we combine several segmentation schemes of the same model but achieve modest improvements. Next, we try a different strategy, where we combine the different segmentation methods rather than the different segmentation schemes. In this case, we achieve stronger improvements over the best single system. Finally, combining schemes and methods has another slight gain over the best combination strategy.

pdf abs
RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus
Jens Forster | Christoph Schmidt | Thomas Hoyoux | Oscar Koller | Uwe Zelle | Justus Piater | Hermann Ney
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper introduces the RWTH-PHOENIX-Weather corpus, a video-based, large vocabulary corpus of German Sign Language suitable for statistical sign language recognition and translation. In contrastto most available sign language data collections, the RWTH-PHOENIX-Weather corpus has not been recorded for linguistic research but for the use in statistical pattern recognition. The corpus contains weather forecasts recorded from German public TV which are manually annotated using glosses distinguishing sign variants, and time boundaries have been marked on the sentence and the gloss level. Further, the spoken German weather forecast has been transcribed in a semi-automatic fashion using a state-of-the-art automatic speech recognition system. Moreover, an additional translation of the glosses into spoken German has been created to capture allowable translation variability. In addition to the corpus, experimental baseline results for hand and head tracking, statistical sign language recognition and translation are presented.

pdf abs
Pivot Lightly-Supervised Training for Statistical Machine Translation
Matthias Huck | Hermann Ney
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

In this paper, we investigate large-scale lightly-supervised training with a pivot language: We augment a baseline statistical machine translation (SMT) system that has been trained on human-generated parallel training corpora with large amounts of additional unsupervised parallel data; but instead of creating this synthetic data from monolingual source language data with the baseline system itself, or from target language data with a reverse system, we employ a parallel corpus of target language data and data in a pivot language. The pivot language data is automatically translated into the source language, resulting in a trilingual corpus with unsupervised source language side. We augment our baseline system with the unsupervised source-target parallel data. Experiments are conducted for the German-French language pair using the standard WMT newstest sets for development and testing. We obtain the unsupervised data by translating the English side of the English-French 109 corpus to German. With careful system design, we are able to achieve improvements of up to +0.4 points BLEU / -0.7 points TER over the baseline.

pdf
Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation
Matthias Huck | Stephan Peitz | Markus Freitag | Hermann Ney
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

2011

pdf
Advancements in Arabic-to-English Hierarchical Machine Translation
Matthias Huck | David Vilar | Daniel Stein | Hermann Ney
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2011 is presented. We participated in the MT (English-French, Arabic-English, ChineseEnglish) and SLT (English-French) tracks. Both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated, including domain adaptation via monolingual and bilingual data selection, phrase training, different lexical smoothing methods, additional reordering models for the hierarchical system, various Arabic and Chinese segmentation methods, punctuation prediction for speech recognition output, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

The Quaero program is an international project promoting research and industrial innovation on technologies for automatic analysis and classification of multimedia and multilingual documents. Within the program framework, research organizations and industrial partners collaborate to develop prototypes of innovating applications and services for access and usage of multimedia data. One of the topics addressed is the translation of spoken language. Each year, a project-internal evaluation is conducted by DGA to monitor the technological advances. This work describes the design and results of the 2011 evaluation campaign. The participating partners were RWTH, KIT, LIMSI and SYSTRAN. Their approaches are compared on both ASR output and reference transcripts of speech data for the translation between French and German. The results show that the developed techniques further the state of the art and improve translation quality.

This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8% and 18.1% on the 2010 evaluation data, relative WER reductions of 14.6% and 17.4% respectively over the best component system.

pdf bib abs
Lexicon models for hierarchical phrase-based machine translation
Matthias Huck | Saab Mansour | Simon Wiesler | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In this paper, we investigate lexicon models for hierarchical phrase-based statistical machine translation. We study five types of lexicon models: a model which is extracted from word-aligned training data and—given the word alignment matrix—relies on pure relative frequencies [1]; the IBM model 1 lexicon [2]; a regularized version of IBM model 1; a triplet lexicon model variant [3]; and a discriminatively trained word lexicon model [4]. We explore sourceto-target models with phrase-level as well as sentence-level scoring and target-to-source models with scoring on phrase level only. For the first two types of lexicon models, we compare several scoring variants. All models are used during search, i.e. they are incorporated directly into the log-linear model combination of the decoder. Phrase table smoothing with triplet lexicon models and with discriminative word lexicons are novel contributions. We also propose a new regularization technique for IBM model 1 by means of the Kullback-Leibler divergence with the empirical unigram distribution as regularization term. Experiments are carried out on the large-scale NIST Chinese→English translation task and on the English→French and Arabic→English IWSLT TED tasks. For Chinese→English and English→French, we obtain the best results by using the discriminative word lexicon to smooth our phrase tables.

pdf abs
Combining translation and language model scoring for domain-specific data filtering
Saab Mansour | Joern Wuebker | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

The increasing popularity of statistical machine translation (SMT) systems is introducing new domains of translation that need to be tackled. As many resources are already available, domain adaptation methods can be applied to utilize these recourses in the most beneficial way for the new domain. We explore adaptation via filtering, using the crossentropy scores to discard irrelevant sentences. We focus on filtering for two important components of an SMT system, namely the language model (LM) and the translation model (TM). Previous work has already applied LM cross-entropy based scoring for filtering. We argue that LM cross-entropy might be appropriate for LM filtering, but not as much for TM filtering. We develop a novel filtering approach based on a combined TM and LM cross-entropy scores. We experiment with two large-scale translation tasks, the Arabic-to-English and English-to-French IWSLT 2011 TED Talks MT tasks. For LM filtering, we achieve strong perplexity improvements which carry over to the translation quality with improvements up to +0.4% BLEU. For TM filtering, the combined method achieves small but consistent improvements over the standalone methods. As a side effect of adaptation via filtering, the fully fledged SMT system vocabulary size and phrase table size are reduced by a factor of at least 2 while up to +0.6% BLEU improvement is observed.

pdf abs
Modeling punctuation prediction as machine translation
Stephan Peitz | Markus Freitag | Arne Mauser | Hermann Ney
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

Punctuation prediction is an important task in Spoken Language Translation. The output of speech recognition systems does not typically contain punctuation marks. In this paper we analyze different methods for punctuation prediction and show improvements in the quality of the final translation output. In our experiments we compare the different approaches and show improvements of up to 0.8 BLEU points on the IWSLT 2011 English French Speech Translation of Talks task using a translation system to translate from unpunctuated to punctuated text instead of a language model based punctuation prediction method. Furthermore, we do a system combination of the hypotheses of all our different approaches and get an additional improvement of 0.4 points in BLEU.

pdf abs
Soft string-to-dependency hierarchical machine translation
Jan-Thorsten Peter | Matthias Huck | Hermann Ney | Daniel Stein
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In this paper, we dissect the influence of several target-side dependency-based extensions to hierarchical machine translation, including a dependency language model (LM). We pursue a non-restrictive approach that does not prohibit the production of hypotheses with malformed dependency structures. Since many questions remained open from previous and related work, we offer in-depth analysis of the influence of the language model order, the impact of dependency-based restrictions on the search space, and the information to be gained from dependency tree building during decoding. The application of a non-restrictive approach together with an integrated dependency LM scoring is a novel contribution which yields significant improvements for two large-scale translation tasks for the language pairs Chinese–English and German–French.

pdf
The RWTH System Combination System for WMT 2011
Gregor Leusch | Markus Freitag | Hermann Ney
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf
Lightly-Supervised Training for Hierarchical Phrase-Based Machine Translation
Matthias Huck | David Vilar | Daniel Stein | Hermann Ney
Proceedings of the First workshop on Unsupervised Learning in NLP

pdf bib
Towards Automatic Error Analysis of Machine Translation Output
Maja Popović | Hermann Ney
Computational Linguistics, Volume 37, Issue 4 - December 2011

2010

pdf
Training Phrase Translation Models with Leaving-One-Out
Joern Wuebker | Arne Mauser | Hermann Ney
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
A Hybrid Morphologically Decomposed Factored Language Models for Arabic LVCSR
Amr El-Desoky | Ralf Schlüter | Hermann Ney
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf abs
A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation
Daniel Stein | Stephan Peitz | David Vilar | Hermann Ney
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

In this work we review and compare three additional syntactic enhancements for the hierarchical phrase-based translation model, which have been presented in the last few years. We compare their performance when applied separately and study whether the combination may yield additional improvements. Our findings show that the models are complementary, and their combination achieve an increase of 1% in BLEU and a reduction of nearly 2% in TER. The models presented in this work are made available as part of the Jane open source machine translation toolkit.

pdf abs
A Source-side Decoding Sequence Model for Statistical Machine Translation
Minwei Feng | Arne Mauser | Hermann Ney
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We propose a source-side decoding sequence language model for phrase-based statistical machine translation. This model is a reordering model in the sense that it helps the decoder find the correct decoding sequence. The model uses word-aligned bilingual training data. We show improved translation quality of up to 1.34% BLEU and 0.54% TER using this model compared to three other widely used reordering models.

pdf abs
A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation
Matthias Huck | Martin Ratajczak | Patrick Lehnen | Hermann Ney
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-of-the-art hierarchical and conventional phrase-based statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply certain restrictions to discard some of the less useful information. We show how these restrictions facilitate the training of the extended lexicon models. We finally evaluate systems that incorporate both types of models with different restrictions on a large-scale translation task for the Arabic-English language pair. Our results suggest that extended lexicon models can be substantially reduced in size while still giving clear improvements in translation performance.

pdf abs
The RWTH Aachen machine translation system for IWSLT 2010
Saab Mansour | Stephan Peitz | David Vilar | Joern Wuebker | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper we describe the statistical machine translation system of the RWTH Aachen University developed for the translation task of the IWSLT 2010. This year, we participated in the BTEC translation task for the Arabic to English language direction. We experimented with two state-of-theart decoders: phrase-based and hierarchical-based decoders. Extensions to the decoders included phrase training (as opposed to heuristic phrase extraction) for the phrase-based decoder, and soft syntactic features for the hierarchical decoder. Additionally, we experimented with various rule-based and statistical-based segmenters for Arabic. Due to the different decoders and the different methodologies that we apply for segmentation, we expect that there will be complimentary variation in the results achieved by each system. The next step would be to exploit these variations and achieve better results by combining the systems. We try different strategies for system combination and report significant improvements over the best single system.

pdf abs
A combination of hierarchical systems with forced alignments from phrase-based systems
Carmen Heger | Joern Wuebker | David Vilar | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Currently most state-of-the-art statistical machine translation systems present a mismatch between training and generation conditions. Word alignments are computed using the well known IBM models for single-word based translation. Afterwards phrases are extracted using extraction heuristics, unrelated to the stochastic models applied for finding the word alignment. In the last years, several research groups have tried to overcome this mismatch, but only with limited success. Recently, the technique of forced alignments has shown to improve translation quality for a phrase-based system, applying a more statistically sound approach to phrase extraction. In this work we investigate the first steps to combine forced alignment with a hierarchical model. Experimental results on IWSLT and WMT data show improvements in translation quality of up to 0.7% BLEU and 1.0% TER.

pdf abs
Multi-pivot translation by system combination
Gregor Leusch | Aurélien Max | Josep Maria Crego | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

This paper describes a technique to exploit multiple pivot languages when using machine translation (MT) on language pairs with scarce bilingual resources, or where no translation system for a language pair is available. The principal idea is to generate intermediate translations in several pivot languages, translate them separately into the target language, and generate a consensus translation out of these using MT system combination techniques. Our technique can also be applied when a translation system for a language pair is available, but is limited in its translation accuracy because of scarce resources. Using statistical MT systems for the 11 different languages of Europarl, we show experimentally that a direct translation system can be replaced by this pivot approach without a loss in translation quality if about six pivot languages are available. Furthermore, we can already improve an existing MT system by adding two pivot systems to it. The maximum improvement was found to be 1.4% abs. in BLEU in our experiments for 8 or more pivot languages.

pdf abs
Sign language machine translation overkill
Daniel Stein | Christoph Schmidt | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

Sign languages represent an interesting niche for statistical machine translation that is typically hampered by the scarceness of suitable data, and most papers in this area apply only a few, well-known techniques and do not adapt them to small-sized corpora. In this paper, we will propose new methods for common approaches like scaling factor optimization and alignment merging strategies which helped improve our baseline. We also conduct experiments with different decoders and employ state-of-the-art techniques like soft syntactic labels as well as trigger-based and discriminative word lexica and system combination. All methods are evaluated on one of the largest sign language corpora available.

pdf
If I only had a parser: poor man’s syntax for hierarchical machine translation
David Vilar | Daniel Stein | Stephan Peitz | Hermann Ney
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf
Micro-adaptation lexicale en traduction automatique statistique [Lexical Micro-adaptation in Statistical Machine Translation]
Josep Maria Crego | Gregor Leusch | Aurélien Max | Hermann Ney | François Yvon
Traitement Automatique des Langues, Volume 51, Numéro 2 : Multilinguisme et traitement automatique des langues [Multilingualism and Natural Language Processing]

pdf
Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models
David Vilar | Daniel Stein | Matthias Huck | Hermann Ney
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf
The RWTH System Combination System for WMT 2010
Gregor Leusch | Hermann Ney
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

The SignSpeak project will be the first step to approach sign language recognition and translation at a scientific level already reached in similar research fields such as automatic speech recognition or statistical machine translation of spoken languages. Deaf communities revolve around sign languages as they are their natural means of communication. Although deaf, hard of hearing and hearing signers can communicate without problems amongst themselves, there is a serious challenge for the deaf community in trying to integrate into educational, social and work environments. The overall goal of SignSpeak is to develop a new vision-based technology for recognizing and translating continuous sign language to text. New knowledge about the nature of sign language structure from the perspective of machine recognition of continuous sign language will allow a subsequent breakthrough in the development of a new vision-based technology for continuous sign language recognition and translation. Existing and new publicly available corpora will be used to evaluate the research progress throughout the whole project.

2009

pdf
Comparison of Extended Lexicon Models in Search and Rescoring for SMT
Saša Hasan | Hermann Ney
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models
Arne Mauser | Saša Hasan | Hermann Ney
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Are Unaligned Words Important for Machine Translation?
Yuqi Zhang | Evgeny Matusov | Hermann Ney
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf
On LM Heuristics for the Cube Growing Algorithm
David Vilar | Hermann Ney
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

pdf bib
Syntax-Oriented Evaluation Measures for Machine Translation Output
Maja Popović | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
The RWTH System Combination System for WMT 2009
Gregor Leusch | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
The RWTH Machine Translation System for WMT 2009
Maja Popović | David Vilar | Daniel Stein | Evgeny Matusov | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
A Deep Learning Approach to Machine Transliteration
Thomas Deselaers | Saša Hasan | Oliver Bender | Hermann Ney
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf abs
A Multi-Genre SMT System for Arabic to French
Saša Hasan | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings.

pdf abs
Automatic Evaluation Measures for Statistical Machine Translation System Optimization
Arne Mauser | Saša Hasan | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no single correct translation. In the extreme case, two translations of the same input can have completely different words and sentence structure while still both being perfectly valid. Large projects and competitions for MT research raised the need for reliable and efficient evaluation of MT systems. For the funding side, the obvious motivation is to measure performance and progress of research. This often results in a specific measure or metric taken as primarily evaluation criterion. Do improvements in one measure really lead to improved MT performance? How does a gain in one evaluation metric affect other measures? This paper is going to answer these questions by a number of experiments.

pdf abs
Benchmark Databases for Video-Based Automatic Sign Language Recognition
Philippe Dreuw | Carol Neidle | Vassilis Athitsos | Stan Sclaroff | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

A new, linguistically annotated, video database for automatic sign language recognition is presented. The new RWTH-BOSTON-400 corpus, which consists of 843 sentences, several speakers and separate subsets for training, development, and testing is described in detail. For evaluation and benchmarking of automatic sign language recognition, large corpora are needed. Recent research has focused mainly on isolated sign language recognition methods using video sequences that have been recorded under lab conditions using special hardware like data gloves. Such databases have often consisted generally of only one speaker and thus have been speaker-dependent, and have had only small vocabularies. A new database access interface, which was designed and created to provide fast access to the database statistics and content, makes it possible to easily browse and retrieve particular subsets of the video database. Preliminary baseline results on the new corpora are presented. In contradistinction to other research in this area, all databases presented in this paper will be publicly available.

Systems that automatically process sign language rely on appropriate data. We therefore present the ATIS sign language corpus that is based on the domain of air travel information. It is available for five languages, English, German, Irish sign language, German sign language and South African sign language. The corpus can be used for different tasks like automatic statistical translation and automatic sign language recognition and it allows the specific modeling of spatial references in signing space.

pdf abs
A Comparison of Various Methods for Concept Tagging for Spoken Language Understanding
Stefan Hahn | Patrick Lehnen | Christian Raymond | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The extraction of flat concepts out of a given word sequence is usually one of the first steps in building a spoken language understanding (SLU) or dialogue system. This paper explores five different modelling approaches for this task and presents results on a French state-of-the-art corpus, MEDIA. Additionally, two log-linear modelling approaches could be further improved by adding morphologic knowledge. This paper goes beyond what has been reported in the literature. We applied the models on the same training and testing data and used the NIST scoring toolkit to evaluate the experimental results to ensure identical conditions for each of the experiments and the comparability of the results. Using a model based on conditional random fields, we achieve a concept error rate of 11.8% on the MEDIA evaluation corpus.

pdf
Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Jia Xu | Jianfeng Gao | Kristina Toutanova | Hermann Ney
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

RWTH’s system for the 2008 IWSLT evaluation consists of a combination of different phrase-based and hierarchical statistical machine translation systems. We participated in the translation tasks for the Chinese-to-English and Arabic-to-English language pairs. We investigated different preprocessing techniques, reordering methods for the phrase-based system, including reordering of speech lattices, and syntax-based enhancements for the hierarchical systems. We also tried the combination of the Arabic-to-English and Chinese-to-English outputs as an additional submission.

pdf abs
Analysing soft syntax features and heuristics for hierarchical phrase based machine translation.
David Vilar | Daniel Stein | Hermann Ney
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

Similar to phrase-based machine translation, hierarchical systems produce a large proportion of phrases, most of which are supposedly junk and useless for the actual translation. For the hierarchical case, however, the amount of extracted rules is an order of magnitude bigger. In this paper, we investigate several soft constraints in the extraction of hierarchical phrases and whether these help as additional scores in the decoding to prune unneeded phrases. We show the methods that help best.

pdf abs
Improvements in dynamic programming beam search for phrase-based statistical machine translation.
Richard Zens | Hermann Ney
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

Search is a central component of any statistical machine translation system. We describe the search for phrase-based SMT in detail and show its importance for achieving good translation quality. We introduce an explicit distinction between reordering and lexical hypotheses and organize the pruning accordingly. We show that for the large Chinese-English NIST task already a small number of lexical alternatives is sufficient, whereas a large number of reordering hypotheses is required to achieve good translation quality. The resulting system compares favorably with the current stateof-the-art, in particular we perform a comparison with cube pruning as well as with Moses.

pdf
Triplet Lexicon Models for Statistical Machine Translation
Saša Hasan | Juri Ganitkevitch | Hermann Ney | Jesús Andrés-Ferrer
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf
Complexity of Finding the BLEU-optimal Hypothesis in a Confusion Network
Gregor Leusch | Evgeny Matusov | Hermann Ney
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation
Richard Zens | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
Are Very Large N-Best Lists Useful for SMT?
Saša Hasan | Richard Zens | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
iROVER: Improving System Combination with Classification
Dustin Hillard | Bjoern Hoffmeister | Mari Ostendorf | Ralf Schlueter | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
Analysis and System Combination of Phrase- and N-Gram-Based Statistical Machine Translation Systems
Marta R. Costa-jussà | Josep M. Crego | David Vilar | José A. R. Fonollosa | José B. Mariño | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
Word-Level Confidence Estimation for Machine Translation
Nicola Ueffing | Hermann Ney
Computational Linguistics, Volume 33, Number 1, March 2007

pdf
Minimum Bayes Risk Decoding for BLEU
Nicola Ehling | Richard Zens | Hermann Ney
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf
Combining data-driven MT systems for improved sign language translation
Sara Morrissey | Andy Way | Daniel Stein | Jan Bungeroth | Hermann Ney
Proceedings of Machine Translation Summit XI: Papers

pdf
Domain dependent statistical machine translation
Jia Xu | Yonggang Deng | Yuqing Gao | Hermann Ney
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Statistical MT from TMI-1988 to TMI-2007: what has happened?
Hermann Ney
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Plenaries

pdf
Hand in hand: automatic sign language to English translation
Daniel Stein | Philippe Dreuw | Hermann Ney | Sara Morrissey | Andy Way
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation
Yuqi Zhang | Richard Zens | Hermann Ney
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

pdf
Can We Translate Letters?
David Vilar | Jan-Thorsten Peter | Hermann Ney
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Word Error Rates: Decomposition over POS classes and Applications for Error Analysis
Maja Popović | Hermann Ney
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
Human Evaluation of Machine Translation Through Binary System Comparisons
David Vilar | Gregor Leusch | Hermann Ney | Rafael E. Banchs
Proceedings of the Second Workshop on Statistical Machine Translation

pdf
A Systematic Comparison of Training Criteria for Statistical Machine Translation
Richard Zens | Saša Hasan | Hermann Ney
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib abs
Improved chunk-level reordering for statistical machine translation
Yuqi Zhang | Richard Zens | Hermann Ney
Proceedings of the Fourth International Workshop on Spoken Language Translation

Inspired by previous chunk-level reordering approaches to statistical machine translation, this paper presents two methods to improve the reordering at the chunk level. By introducing a new lattice weighting factor and by reordering the training source data, an improvement is reported on TER and BLEU. Compared to the previous chunklevel reordering approach, the BLEU score improves 1.4% absolutely. The translation results are reported on IWSLT Chinese-English task.

pdf abs
The RWTH machine translation system for IWSLT 2007
Arne Mauser | David Vilar | Gregor Leusch | Yuqi Zhang | Hermann Ney
Proceedings of the Fourth International Workshop on Spoken Language Translation

The RWTH system for the IWSLT 2007 evaluation is a combination of several statistical machine translation systems. The combination includes Phrase-Based models, a n-gram translation model and a hierarchical phrase model. We describe the individual systems and the method that was used for combining the system outputs. Compared to our 2006 system, we newly introduce a hierarchical phrase-based translation model and show improvements in system combination for Machine Translation. RWTH participated in the Italian-to-English and Chinese-to-English translation directions.

2006

pdf abs
Training a Statistical Machine Translation System without GIZA++
Arne Mauser | Evgeny Matusov | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The IBM Models (Brown et al., 1993) enjoy great popularity in the machine translation community because they offer high quality word alignments and a free implementation is available with the GIZA++ Toolkit (Och and Ney, 2003). Several methods have been developed to overcome the asymmetry of the alignment generated by the IBM Models. A remaining disadvantage, however, is the high model complexity. This paper describes a word alignment training procedure for statistical machine translation that uses a simple and clear statistical model, different from the IBM models. The main idea of the algorithm is to generate a symmetric and monotonic alignment between the target sentence and a permutation graph representing different reorderings of the words in the source sentence. The quality of the generated alignment is shown to be comparable to the standard GIZA++ training in an SMT setup.

pdf abs
Creating a Large-Scale Arabic to French Statistical MachineTranslation System
Saša Hasan | Anas El Isbihani | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the overall number of unknown words. We describe the phrase-based SMT system used for training and generation of the translation hypotheses. Results on the second CESTA evaluation campaign are reported. The setting was inthe medical domain. The prototype reaches a favorable BLEU score of40.8%.

pdf abs
POS-based Word Reorderings for Statistical Machine Translation
Maja Popović | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Translation In this work we investigate new possibilities for improving the quality of statistical machine translation (SMT) by applying word reorderings of the source language sentences based on Part-of-Speech tags. Results are presented on the European Parliament corpus containing about 700k sentences and 15M running words. In order to investigate sparse training data scenarios, we also report results obtained on about 1\% of the original corpus. The source languages are Spanish and English and target languages are Spanish, English and German. We propose two types of reorderings depending on the language pair and the translation direction: local reorderings of nouns and adjectives for translation from and into Spanish and long-range reorderings of verbs for translation into German. For our best translation system, we achieve up to 2\% relative reduction of WER and up to 7\% relative increase of BLEU score. Improvements can be seen both on the reordered sentences as well as on the rest of the test corpus. Local reorderings are especially important for the translation systems trained on the small corpus whereas long-range reorderings are more effective for the larger corpus.

pdf abs
Error Analysis of Statistical Machine Translation Output
David Vilar | Jia Xu | Luis Fernando D’Haro | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification of the most prominent source of errors in a given system using these measures alone is not possible. Therefore some analysis of the generated translations is needed in order to identify the main problems and to focus the research efforts. This area is however mostly unexplored and few works have dealt with it until now. In this paper we will present a framework for classification of the errors of a machine translation system and we will carry out an error analysis of the system used by the RWTH in the first TC-STAR evaluation.

pdf abs
A German Sign Language Corpus of the Domain Weather Report
Jan Bungeroth | Daniel Stein | Philippe Dreuw | Morteza Zahedi | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

All systems for automatic sign language translation and recognition, in particular statistical systems, rely on adequately sized corpora. For this purpose, we created the Phoenix corpus that is based on German television weather reports translated into German Sign Language. It comes with a rich annotation of the video data, a bilingual text-based sentence corpus and a monolingual German corpus. All systems for automatic sign language translation and recognition, in particular statistical systems, rely on adequately sized corpora. For this purpose, we created the Phoenix corpus that is based on German television weather reports translated into German Sign Language. It comes with a rich annotation of the video data, a bilingual text-based sentence corpus and a monolingual German corpus.

pdf
Integration of Speech to Computer-Assisted Translation Using Finite-State Automata
Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
The RWTH statistical machine translation system for the IWSLT 2006 evaluation
Arne Mauser | Richard Zens | Evgeny Matusov | Sasa Hasan | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Automatic sentence segmentation and punctuation prediction for spoken language translation
Evgeny Matusov | Arne Mauser | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf
AER: do we need to “improve” our alignments?
David Vilar | Maja Popovic | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf
Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment
Evgeny Matusov | Nicola Ueffing | Hermann Ney
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf
CDER: Efficient MT Evaluation Using Block Movements
Gregor Leusch | Nicola Ueffing | Hermann Ney
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Reranking Translation Hypotheses Using Structural Properties
Saša Hasan | Oliver Bender | Hermann Ney
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

pdf
Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
Anas El Isbihani | Shahram Khadivi | Oliver Bender | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

pdf
Discriminative Reordering Models for Statistical Machine Translation
Richard Zens | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

pdf
N-Gram Posterior Probabilities for Statistical Machine Translation
Richard Zens | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

pdf
Partitioning Parallel Documents Using Binary Segmentation
Jia Xu | Richard Zens | Hermann Ney
Proceedings on the Workshop on Statistical Machine Translation

pdf
A Flexible Architecture for CAT Applications
Saša Hasan | Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

pdf
Morpho-Syntax Based Statistical Methods for Automatic Sign Language Translation
Daniel Stein | Jan Bungeroth | Hermann Ney
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

pdf abs
One Decade of Statistical Machine Translation: 1996-2005
Hermann Ney
Proceedings of Machine Translation Summit X: Invited papers

In the last decade, the statistical approach has found widespread use in machine translation both for written and spoken language and has had a major impact on the translation accuracy. This paper will cover the principles of statistical machine translation and summarize the progress made so far.

pdf abs
Statistical Machine Translation of European Parliamentary Speeches
David Vilar | Evgeny Matusov | Sasa Hasan | Richard Zens | Hermann Ney
Proceedings of Machine Translation Summit X: Papers

In this paper we present the ongoing work at RWTH Aachen University for building a speech-to-speech translation system within the TC-Star project. The corpus we work on consists of parliamentary speeches held in the European Plenary Sessions. To our knowledge, this is the first project that focuses on speech-to-speech translation applied to a real-life task. We describe the statistical approach used in the development of our system and analyze its performance under different conditions: dealing with syntactically correct input, dealing with the exact transcription of speech and dealing with the (noisy) output of an automatic speech recognition system. Experimental results show that our system is able to perform adequately in each of these conditions.

pdf
Integrated Chinese Word Segmentation in Statistical Machine Translation
Jia Xu | Evgeny Matusov | Richard Zens | Hermann Ney
Proceedings of the Second International Workshop on Spoken Language Translation

pdf
Evaluating Machine Translation Output with Automatic Sentence Segmentation
Evgeny Matusov | Gregor Leusch | Oliver Bender | Hermann Ney
Proceedings of the Second International Workshop on Spoken Language Translation

pdf
Augmenting a Small Parallel Text with Morpho-Syntactic Language
Maja Popović | David Vilar | Hermann Ney | Slobodan Jovičić | Zoran Šarić
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation
Stephan Kanthak | David Vilar | Evgeny Matusov | Richard Zens | Hermann Ney
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf
Word Graphs for Statistical Machine Translation
Richard Zens | Hermann Ney
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf
Preprocessing and Normalization for Automatic Evaluation of Machine Translation
Gregor Leusch | Nicola Ueffing | David Vilar | Hermann Ney
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization

pdf
Word-Level Confidence Estimation for Machine Translation using Phrase-Based Translation Models
Nicola Ueffing | Hermann Ney
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf
Comparison of generation strategies for interactive machine translation
Oliver Bender | Saša Hasan | David Vilar | Richard Zens | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf
Clustered language models based on regular expressions for SMT
Saša Hasan | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf
Efficient statistical machine translation with constrained reordering
Evgeny Matusov | Stephan Kanthak | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf
Exploiting phrasal lexica and additional morpho-syntactic language resources for statistical machine translation with scarce training data
Maja Popovic | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf
Application of word-level confidence measures in interactive statistical machine translation
Nicola Ueffing | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf
Sentence segmentation using IBM word alignment model 1
Jia Xu | Richard Zens | Hermann Ney
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf
FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation
Stephan Kanthak | Hermann Ney
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
Alignment templates: the RWTH SMT system
Oliver Bender | Richard Zens | Evgeny Matusov | Hermann Ney
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

pdf
Statistical machine translation of spontaneous speech with scarce resources
Evgeny Matusov | Maja Popovic | Richard Zens | Hermann Ney
Proceedings of the First International Workshop on Spoken Language Translation: Papers

pdf
Do We Need Chinese Word Segmentation for Statistical Machine Translation?
Jia Xu | Richard Zens | Hermann Ney
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing

pdf
Error Measures and Bayes Decision Rules Revisited with Applications to POS Tagging
Hermann Ney | Maja Popović | David Sündermann
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf
Improved Word Alignment Using a Symmetric Lexicon Model
Richard Zens | Evgeny Matusov | Hermann Ney
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Reordering Constraints for Phrase-Based Statistical Machine Translation
Richard Zens | Hermann Ney | Taro Watanabe | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Symmetric Word Alignments for Statistical Machine Translation
Evgeny Matusov | Richard Zens | Hermann Ney
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Improving Word Alignment Quality using Morpho-syntactic Information
Hermann Ney | Maja Popovic
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
Towards the Use of Word Stems and Suffixes for Statistical Machine Translation
Maja Popović | Hermann Ney
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
Sonja Nießen | Hermann Ney
Computational Linguistics, Volume 30, Number 2, June 2004

pdf bib
The Alignment Template Approach to Statistical Machine Translation
Franz Josef Och | Hermann Ney
Computational Linguistics, Volume 30, Number 4, December 2004

pdf
Improvements in Phrase-Based Statistical Machine Translation
Richard Zens | Hermann Ney
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf
A Comparative Study on Reordering Constraints in Statistical Machine Translation
Richard Zens | Hermann Ney
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf
Using POS Information for SMT into Morphologically Rich Languages
Nicola Ueffing | Hermann Ney
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Efficient Search for Interactive Statistical Machine Translation
Franz Josef Och | Richard Zens | Hermann Ney
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Comparison of Alignment Templates and Maximum Entropy Models for NLP
Oliver Bender | Klaus Macherey | Franz Josef Och | Hermann Ney
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Maximum Entropy Models for Named Entity Recognition
Oliver Bender | Franz Josef Och | Hermann Ney
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

Have we found the Holy Grail?
Hermann Ney
Proceedings of Machine Translation Summit IX: Plenaries

pdf abs
A novel string-to-string distance measure with applications to machine translation evaluation
Gregor Leusch | Nicola Ueffing | Hermann Ney
Proceedings of Machine Translation Summit IX: Papers

We introduce a string-to-string distance measure which extends the edit distance by block transpositions as constant cost edit operation. An algorithm for the calculation of this distance measure in polynomial time is presented. We then demonstrate how this distance measure can be used as an evaluation criterion in machine translation. The correlation between this evaluation criterion and human judgment is systematically compared with that of other automatic evaluation measures on two translation tasks. In general, like other automatic evaluation measures, the criterion shows low correlation at sentence level, but good correlation at system level.

pdf abs
Confidence measures for statistical machine translation
Nicola Ueffing | Klaus Macherey | Hermann Ney
Proceedings of Machine Translation Summit IX: Papers

In this paper, we present several confidence measures for (statistical) machine translation. We introduce word posterior probabilities for words in the target sentence that can be determined either on a word graph or on an N best list. Two alternative confidence measures that can be calculated on N best lists are proposed. The performance of the measures is evaluated on two different translation tasks: on spontaneously spoken dialogues from the domain of appointment scheduling, and on a collection of technical manuals.

pdf bib
A Systematic Comparison of Various Statistical Alignment Models
Franz Josef Och | Hermann Ney
Computational Linguistics, Volume 29, Number 1, March 2003

pdf
Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation
Christoph Tillmann | Hermann Ney
Computational Linguistics, Volume 29, Number 1, March 2003

2002

pdf
Generation of Word Graphs in Statistical Machine Translation
Nicola Ueffing | Franz Josef Och | Hermann Ney
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf
Improving Alignment Quality in Statistical Machine Translation Using Context-dependent Maximum Entropy Models
Ismael García Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
COLING 2002: The 19th International Conference on Computational Linguistics

pdf
Discriminative Training and Maximum Entropy Models for Statistical Machine Translation
Franz Josef Och | Hermann Ney
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf abs
Efficient integration of maximum entropy lexicon models within the training of statistical alignment models
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers

Maximum entropy (ME) models have been successfully applied to many natural language problems. In this paper, we show how to integrate ME models efficiently within a maximum likelihood training scheme of statistical machine translation models. Specifically, we define a set of context-dependent ME lexicon models and we present how to perform an efficient training of these ME models within the conventional expectation-maximization (EM) training of statistical translation models. Experimental results are also given in order to demonstrate how these ME models improve the results obtained with the traditional translation models. The results are presented by means of alignment quality comparing the resulting alignments with manually annotated reference alignments.

2001

pdf
Stochastic Modelling: From Pattern Classification to Language Translation
Hermann Ney
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf
Toward hierarchical models for statistical machine translation of inflected languages
Sonja Niessen | Hermann Ney
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf
An Efficient A* Search Algorithm for Statistical Machine Translation
Franz Josef Och | Nicola Ueffing | Hermann Ney
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf
The RWTH System for Statistical Translation of Spoken Dialogues
H. Ney | F. J. Och | S. Vogel
Proceedings of the First International Conference on Human Language Technology Research

pdf abs
Morpho-syntactic analysis for reordering in statistical machine translation
Sonja Niessen | Hermann Ney
Proceedings of Machine Translation Summit VIII

In the framework of statistical machine translation (SMT), correspondences between the words in the source and the target language are learned from bilingual corpora on the basis of so-called alignment models. Among other things these are meant to capture the differences in word order in different languages. In this paper we show that SMT can take advantage of the explicit introduction of some linguistic knowledge about the sentence structure in the languages under consideration. In contrast to previous publications dealing with the incorporation of morphological and syntactic information into SMT, we focus on two aspects of reordering for the language pair German and English, namely question inversion and detachable German verb prefixes. The results of systematic experiments are reported and demonstrate the applicability of the approach to both translation directions on a German-English corpus.

pdf abs
Statistical multi-source translation
Franz Josef Och | Hermann Ney
Proceedings of Machine Translation Summit VIII

We describe methods for translating a text given in multiple source languages into a single target language. The goal is to improve translation quality in applications where the ultimate goal is to translate the same document into many languages. We describe a statistical approach and two specific statistical models to deal with this problem. Our method is generally applicable as it is independent of specific models, languages or application domains. We evaluate the approach on a multilingual corpus covering all eleven official European Union languages that was collected automatically from the Internet. In various tests we show that these methods can significantly improve translation quality. As a side effect, we also compare the quality of statistical machine translation systems for many European languages in the same domain.

pdf abs
What can machine translation learn from speech recognition?
Franz Josef Och | Hermann Ney
Workshop on MT2010: Towards a Road Map for MT

The performance of machine translation technology after 50 years of development leaves much to be desired. There is a high demand for well performing and cheap MT systems for many language pairs and domains, which automatically adapt to rapidly changing terminology. We argue that for successful MT systems it will be crucial to apply data-driven methods, especially statistical machine translation. In addition, it will be very important to establish common test environments. This includes the availability of large parallel training corpora, well defined test corpora and standardized evaluation criteria. Thereby research results can be compared and this will open the possibility for more competition in MT research.

pdf
Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach
Ismael García-Varea | Franz J. Och | Hermann Ney | Francisco Casacuberta
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf
Statistical Machine Translation
Franz Josef Och | Hermann Ney
5th EAMT Workshop: Harvesting Existing Resources

pdf abs
On the Use of Grammar Based Language Models for Statistical Machine Translation
Hassan Sawaf | Kai Schütz | Hermann Ney
Proceedings of the Sixth International Workshop on Parsing Technologies

In this paper, we describe some concepts of language models beyond the usually used standard trigram and use such language models for statistical machine translation. In statistical machine translation the language model is the a-priori knowledge source of the system about the target language. One important requirement for the language model is the correct word order, given a certain choice of words, and to score the translations generated by the translation model Pr(f₁^J/e^I₁), in view of the syntactic context. In addition to standard m-grams with long histories, we examine the use of Part-of-Speech based models as well as linguistically motivated grammars with stochastic parsing as a special type of language model. Translation results are given on the VERBMOBIL task, where translation is performed from German to English, with vocabulary sizes of 6500 and 4000 words, respectively.

pdf
An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research
Sonja Nießen | Franz Josef Och | Gregor Leusch | Hermann Ney
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf
Translation with Cascaded Finite State Transducers
Stephan Vogel | Hermann Ney
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf
Improved Statistical Alignment Models
Franz Josef Och | Hermann Ney
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf
Word Re-ordering and DP-based Search in Statistical Machine Translation
Christoph Tillmann | Hermann Ney
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf
Improving SMT quality with morpho-syntactic analysis
Sonja Nießen | Hermann Ney
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf
A Comparison of Alignment Models for Statistical Machine Translation
Franz Josef Och | Hermann Ney
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf
Construction of a Hierarchical Translation Memory
S. Vogel | H. Ney
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics