Luzhen Xu


2023

pdf
Submission of USTC’s System for the IWSLT 2023 - Offline Speech Translation Track
Xinyuan Zhou | Jianwei Cui | Zhongyi Ye | Yichi Wang | Luzhen Xu | Hanyi Zhang | Weitai Zhang | Lirong Dai
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes the submissions of the research group USTC-NELSLIP to the 2023 IWSLT Offline Speech Translation competition, which involves translating spoken English into written Chinese. We utilize both cascaded models and end-to-end models for this task. To improve the performance of the cascaded models, we introduce Whisper to reduce errors in the intermediate source language text, achieving a significant improvement in ASR recognition performance. For end-to-end models, we propose Stacked Acoustic-and-Textual En- coding extension (SATE-ex), which feeds the output of the acoustic decoder into the textual decoder for information fusion and to prevent error propagation. Additionally, we improve the performance of the end-to-end system in translating speech by combining the SATE-ex model with the encoder-decoder model through ensembling.