Zhiqi Wang


2018

pdf
The JHU/KyotoU Speech Translation System for IWSLT 2018
Hirofumi Inaguma | Xuan Zhang | Zhiqi Wang | Adithya Renduchintala | Shinji Watanabe | Kevin Duh
Proceedings of the 15th International Conference on Spoken Language Translation

This paper describes the Johns Hopkins University (JHU) and Kyoto University submissions to the Speech Translation evaluation campaign at IWSLT2018. Our end-to-end speech translation systems are based on ESPnet and implements an attention-based encoder-decoder model. As comparison, we also experiment with a pipeline system that uses independent neural network systems for both the speech transcription and text translation components. We find that a transfer learning approach that bootstraps the end-to-end speech translation system with speech transcription system’s parameters is important for training on small datasets.