End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022

Xingshan Zeng, Pengfei Li, Liangyou Li, Qun Liu


Abstract
This paper describes the system submitted to AutoSimTrans 2022 from Huawei Noah’s Ark Lab, which won the first place in the audio input track of the Chinese-English translation task. Our system is based on RealTranS, an end-to-end simultaneous speech translation model. We enhance the model with pretraining, by initializing the acoustic encoder with ASR encoder, and the semantic encoder and decoder with NMT encoder and decoder, respectively. To relieve the data scarcity, we further construct pseudo training corpus as a kind of knowledge distillation with ASR data and the pretrained NMT model. Meanwhile, we also apply several techniques to improve the robustness and domain generalizability, including punctuation removal, token-level knowledge distillation and multi-domain finetuning. Experiments show that our system significantly outperforms the baselines at all latency and also verify the effectiveness of our proposed methods.
Anthology ID:
2022.autosimtrans-1.5
Volume:
Proceedings of the Third Workshop on Automatic Simultaneous Translation
Month:
July
Year:
2022
Address:
Online
Venue:
AutoSimTrans
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25–33
Language:
URL:
https://aclanthology.org/2022.autosimtrans-1.5
DOI:
10.18653/v1/2022.autosimtrans-1.5
Bibkey:
Cite (ACL):
Xingshan Zeng, Pengfei Li, Liangyou Li, and Qun Liu. 2022. End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022. In Proceedings of the Third Workshop on Automatic Simultaneous Translation, pages 25–33, Online. Association for Computational Linguistics.
Cite (Informal):
End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022 (Zeng et al., AutoSimTrans 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.autosimtrans-1.5.pdf
Data
BSTC