Zhiqi Wang

2018

This paper describes the Johns Hopkins University (JHU) and Kyoto University submissions to the Speech Translation evaluation campaign at IWSLT2018. Our end-to-end speech translation systems are based on ESPnet and implements an attention-based encoder-decoder model. As comparison, we also experiment with a pipeline system that uses independent neural network systems for both the speech transcription and text translation components. We find that a transfer learning approach that bootstraps the end-to-end speech translation system with speech transcription system’s parameters is important for training on small datasets.

Co-authors

Hirofumi Inaguma 1
Xuan Zhang 1
Adithya Renduchintala 1
Shinji Watanabe 1
Kevin Duh 1

Venues

IWSLT1