On Using SpecAugment for End-to-End Speech Translation

Parnia Bahar; Albert Zeyer; Ralf Schlueter; Hermann Ney

On Using SpecAugment for End-to-End Speech Translation

Parnia Bahar, Albert Zeyer, Ralf Schlüter, Hermann Ney

Abstract

This work investigates a simple data augmentation technique, SpecAugment, for end-to-end speech translation. SpecAugment is a low-cost implementation method applied directly to the audio input features and it consists of masking blocks of frequency channels, and/or time steps. We apply SpecAugment on end-to-end speech translation tasks and achieve up to +2.2% BLEU on LibriSpeech Audiobooks En→Fr and +1.2% on IWSLT TED-talks En→De by alleviating overfitting to some extent. We also examine the effectiveness of the method in a variety of data scenarios and show that the method also leads to significant improvements in various data conditions irrespective of the amount of training data.

Anthology ID:: 2019.iwslt-1.22
Volume:: Proceedings of the 16th International Conference on Spoken Language Translation
Month:: November 2-3
Year:: 2019
Address:: Hong Kong
Editors:: Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
Venue:: IWSLT
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2019.iwslt-1.22/
DOI:
Bibkey:
Cite (ACL):: Parnia Bahar, Albert Zeyer, Ralf Schlüter, and Hermann Ney. 2019. On Using SpecAugment for End-to-End Speech Translation. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
Cite (Informal):: On Using SpecAugment for End-to-End Speech Translation (Bahar et al., IWSLT 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2019.iwslt-1.22.pdf

PDF Cite Search Fix data