Tomoya Yanagita


2023

pdf
NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023
Ryo Fukuda | Yuta Nishikawa | Yasumasa Kano | Yuka Ko | Tomoya Yanagita | Kosuke Doi | Mana Makinae | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes NAIST’s submission to the IWSLT 2023 Simultaneous Speech Translation task: English-to-German, Japanese, Chinese speech-to-text translation and English-to-Japanese speech-to-speech translation. Our speech-to-text system uses an end-to-end multilingual speech translation model based on large-scale pre-trained speech and text models. We add Inter-connections into the model to incorporate the outputs from intermediate layers of the pre-trained speech model and augment prefix-to-prefix text data using Bilingual Prefix Alignment to enhance the simultaneity of the offline speech translation model. Our speech-to-speech system employs an incremental text-to-speech module that consists of a Japanese pronunciation estimation model, an acoustic model, and a neural vocoder.