Kosuke Doi


2023

pdf
NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023
Ryo Fukuda | Yuta Nishikawa | Yasumasa Kano | Yuka Ko | Tomoya Yanagita | Kosuke Doi | Mana Makinae | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes NAIST’s submission to the IWSLT 2023 Simultaneous Speech Translation task: English-to-German, Japanese, Chinese speech-to-text translation and English-to-Japanese speech-to-speech translation. Our speech-to-text system uses an end-to-end multilingual speech translation model based on large-scale pre-trained speech and text models. We add Inter-connections into the model to incorporate the outputs from intermediate layers of the pre-trained speech model and augment prefix-to-prefix text data using Bilingual Prefix Alignment to enhance the simultaneity of the offline speech translation model. Our speech-to-speech system employs an incremental text-to-speech module that consists of a Japanese pronunciation estimation model, an acoustic model, and a neural vocoder.

2022

pdf
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022
Ryo Fukuda | Yuka Ko | Yasumasa Kano | Kosuke Doi | Hirotaka Tokuyama | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes NAIST’s simultaneous speech translation systems developed for IWSLT 2022 Evaluation Campaign. We participated the speech-to-speech track for English-to-German and English-to-Japanese. Our primary submissions were end-to-end systems using adaptive segmentation policies based on Prefix Alignment.

2021

pdf
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task
Ryo Fukuda | Yui Oka | Yasumasa Kano | Yuki Yano | Yuka Ko | Hirotaka Tokuyama | Kosuke Doi | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

pdf
Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data
Kosuke Doi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.