Tomoya Yanagita
2023
NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023
Ryo Fukuda
|
Yuta Nishikawa
|
Yasumasa Kano
|
Yuka Ko
|
Tomoya Yanagita
|
Kosuke Doi
|
Mana Makinae
|
Sakriani Sakti
|
Katsuhito Sudoh
|
Satoshi Nakamura
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
This paper describes NAIST’s submission to the IWSLT 2023 Simultaneous Speech Translation task: English-to-German, Japanese, Chinese speech-to-text translation and English-to-Japanese speech-to-speech translation. Our speech-to-text system uses an end-to-end multilingual speech translation model based on large-scale pre-trained speech and text models. We add Inter-connections into the model to incorporate the outputs from intermediate layers of the pre-trained speech model and augment prefix-to-prefix text data using Bilingual Prefix Alignment to enhance the simultaneity of the offline speech translation model. Our speech-to-speech system employs an incremental text-to-speech module that consists of a Japanese pronunciation estimation model, an acoustic model, and a neural vocoder.
Search
Co-authors
- Ryo Fukuda 1
- Yuta Nishikawa 1
- Yasumasa Kano 1
- Yuka Ko 1
- Kosuke Doi 1
- show all...