Albert Zeyer
2019
On Using SpecAugment for End-to-End Speech Translation
Parnia Bahar
|
Albert Zeyer
|
Ralf Schlüter
|
Hermann Ney
Proceedings of the 16th International Conference on Spoken Language Translation
This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.
2018
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Albert Zeyer
|
Tamer Alkhouli
|
Hermann Ney
Proceedings of ACL 2018, System Demonstrations
We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.
Neural Speech Translation at AppTek
Evgeny Matusov
|
Patrick Wilken
|
Parnia Bahar
|
Julian Schamper
|
Pavel Golik
|
Albert Zeyer
|
Joan Albert Silvestre-Cerda
|
Adrià Martínez-Villaronga
|
Hendrik Pesch
|
Jan-Thorsten Peter
Proceedings of the 15th International Conference on Spoken Language Translation
This work describes AppTek’s speech translation pipeline that includes strong state-of-the-art automatic speech recognition (ASR) and neural machine translation (NMT) components. We show how these components can be tightly coupled by encoding ASR confusion networks, as well as ASR-like noise adaptation, vocabulary normalization, and implicit punctuation prediction during translation. In another experimental setup, we propose a direct speech translation approach that can be scaled to translation tasks with large amounts of text-only parallel training data but a limited number of hours of recorded and human-translated speech.