Abstract
This paper describes the system we submitted to the IWSLT 2023 multilingual speech translation track, with input being English speech and output being text in 10 target languages. Our system consists of CNN and Transformer, convolutional neural networks downsample speech features and extract local information, while transformer extract global features and output the final results. In our system, we use speech recognition tasks to pre-train encoder parameters, and then use speech translation corpus to train the multilingual speech translation model. We have also adopted other methods to optimize the model, such as data augmentation, model ensemble, etc. Our system can obtain satisfactory results on test sets of 10 languages in the MUST-C corpus.- Anthology ID:
- 2023.iwslt-1.44
- Volume:
- Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada (in-person and online)
- Editors:
- Elizabeth Salesky, Marcello Federico, Marine Carpuat
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 455–460
- Language:
- URL:
- https://aclanthology.org/2023.iwslt-1.44
- DOI:
- 10.18653/v1/2023.iwslt-1.44
- Cite (ACL):
- Zhipeng Wang, Yuhang Guo, and Shuoying Chen. 2023. BIT’s System for Multilingual Track. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 455–460, Toronto, Canada (in-person and online). Association for Computational Linguistics.
- Cite (Informal):
- BIT’s System for Multilingual Track (Wang et al., IWSLT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.iwslt-1.44.pdf