Weitai Zhang


2023

pdf
The USTC’s Dialect Speech Translation System for IWSLT 2023
Pan Deng | Shihao Chen | Weitai Zhang | Jie Zhang | Lirong Dai
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper presents the USTC system for the IWSLT 2023 Dialectal and Low-resource shared task, which involves translation from Tunisian Arabic to English. We aim to investigate the mutual transfer between Tunisian Arabic and Modern Standard Arabic (MSA) to enhance the performance of speech translation (ST) by following standard pre-training and fine-tuning pipelines. We synthesize a substantial amount of pseudo Tunisian-English paired data using a multi-step pre-training approach. Integrating a Tunisian-MSA translation module into the end-to-end ST model enables the transfer from Tunisian to MSA and facilitates linguistic normalization of the dialect. To increase the robustness of the ST system, we optimize the model’s ability to adapt to ASR errors and propose a model ensemble method. Results indicate that applying the dialect transfer method can increase the BLEU score of dialectal ST. It is shown that the optimal system ensembles both cascaded and end-to-end ST models, achieving BLEU improvements of 2.4 and 2.8 in test1 and test2 sets, respectively, compared to the best published system.

pdf
Submission of USTC’s System for the IWSLT 2023 - Offline Speech Translation Track
Xinyuan Zhou | Jianwei Cui | Zhongyi Ye | Yichi Wang | Luzhen Xu | Hanyi Zhang | Weitai Zhang | Lirong Dai
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes the submissions of the research group USTC-NELSLIP to the 2023 IWSLT Offline Speech Translation competition, which involves translating spoken English into written Chinese. We utilize both cascaded models and end-to-end models for this task. To improve the performance of the cascaded models, we introduce Whisper to reduce errors in the intermediate source language text, achieving a significant improvement in ASR recognition performance. For end-to-end models, we propose Stacked Acoustic-and-Textual En- coding extension (SATE-ex), which feeds the output of the acoustic decoder into the textual decoder for information fusion and to prevent error propagation. Additionally, we improve the performance of the end-to-end system in translating speech by combining the SATE-ex model with the encoder-decoder model through ensembling.

2022

pdf
The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022
Weitai Zhang | Zhongyi Ye | Haitao Tang | Xiaoxi Li | Xinyuan Zhou | Jing Yang | Jianwei Cui | Pan Deng | Mohan Shi | Yifan Song | Dan Liu | Junhua Liu | Lirong Dai
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese. We describe both cascaded architectures and end-to-end models which can directly translate source speech into target text. In the cascaded condition, we investigate the effectiveness of different model architectures with robust training and achieve 2.72 BLEU improvements over last year’s optimal system on MuST-C English-German test set. In the end-to-end condition, we build models based on Transformer and Conformer architectures, achieving 2.26 BLEU improvements over last year’s optimal end-to-end system. The end-to-end system has obtained promising results, but it is still lagging behind our cascaded models.