CMU’s IWSLT 2023 Simultaneous Speech Translation System
Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe
Abstract
This paper describes CMU’s submission to the IWSLT 2023 simultaneous speech translation shared task for translating English speech to both German text and speech in a streaming fashion. We first build offline speech-to-text (ST) models using the joint CTC/attention framework. These models also use WavLM front-end features and mBART decoder initialization. We adapt our offline ST models for simultaneous speech-to-text translation (SST) by 1) incrementally encoding chunks of input speech, re-computing encoder states for each new chunk and 2) incrementally decoding output text, pruning beam search hypotheses to 1-best after processing each chunk. We then build text-to-speech (TTS) models using the VITS framework and achieve simultaneous speech-to-speech translation (SS2ST) by cascading our SST and TTS models.- Anthology ID:
- 2023.iwslt-1.20
- Volume:
- Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada (in-person and online)
- Editors:
- Elizabeth Salesky, Marcello Federico, Marine Carpuat
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 235–240
- Language:
- URL:
- https://aclanthology.org/2023.iwslt-1.20
- DOI:
- 10.18653/v1/2023.iwslt-1.20
- Cite (ACL):
- Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, and Shinji Watanabe. 2023. CMU’s IWSLT 2023 Simultaneous Speech Translation System. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 235–240, Toronto, Canada (in-person and online). Association for Computational Linguistics.
- Cite (Informal):
- CMU’s IWSLT 2023 Simultaneous Speech Translation System (Yan et al., IWSLT 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.iwslt-1.20.pdf