Tomoki Hayashi
2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
|
Jiatong Shi
|
Yun Tang
|
Hirofumi Inaguma
|
Yifan Peng
|
Siddharth Dalmia
|
Peter Polák
|
Patrick Fernandes
|
Dan Berrebbi
|
Tomoki Hayashi
|
Xiaohui Zhang
|
Zhaoheng Ni
|
Moto Hira
|
Soumi Maiti
|
Juan Pino
|
Shinji Watanabe
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) – each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.
2020
ESPnet-ST: All-in-One Speech Translation Toolkit
Hirofumi Inaguma
|
Shun Kiyono
|
Kevin Duh
|
Shigeki Karita
|
Nelson Yalta
|
Tomoki Hayashi
|
Shinji Watanabe
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework. ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation. We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines for a wide range of benchmark datasets. Our reproducible results can match or even outperform the current state-of-the-art performances; these pre-trained models are downloadable. The toolkit is publicly available at https://github.com/espnet/espnet.
Search
Co-authors
- Hirofumi Inaguma 2
- Shinji Watanabe 2
- Shun Kiyono 1
- Kevin Duh 1
- Shigeki Karita 1
- show all...
Venues
- acl2