2023
pdf
abs
CTC-based Non-autoregressive Speech Translation
Chen Xu
|
Xiaoqian Liu
|
Xiaowen Liu
|
Qingxuan Sun
|
Yuhao Zhang
|
Murun Yang
|
Qianqian Dong
|
Tom Ko
|
Mingxuan Wang
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST).In particular, we develop a model consisting of two encoders that are guided by CTC to predict the source and target texts, respectively. Introducing CTC into NAST on both language sides has obvious challenges: 1) the conditional independent generation somewhat breaks the interdependency among tokens, and 2) the monotonic alignment assumption in standard CTC does not hold in translation tasks. In response, we develop a prediction-aware encoding approach and a cross-layer attention approach to address these issues. We also use curriculum learning to improve convergence of training. Experiments on the MuST-C ST benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67×, which is comparable to the autoregressive counterpart and even outperforms the previous best result of 0.9 BLEU points.
2022
pdf
abs
The NiuTrans Machine Translation Systems for WMT22
Weiqiao Shan
|
Zhiquan Cao
|
Yuchen Han
|
Siming Wu
|
Yimin Hu
|
Jie Wang
|
Yi Zhang
|
Hou Baoyu
|
Hang Cao
|
Chenghao Gao
|
Xiaowen Liu
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes the NiuTrans neural machine translation systems of the WMT22 General MT constrained task. We participate in four directions, including Chinese→English, English→Croatian, and Livonian↔English. Our models are based on several advanced Transformer variants, e.g., Transformer-ODE, Universal Multiscale Transformer (UMST). The main workflow consists of data filtering, large-scale data augmentation (i.e., iterative back-translation, iterative knowledge distillation), and specific-domain fine-tuning. Moreover, we try several multi-domain methods, such as a multi-domain model structure and a multi-domain data clustering method, to rise to this year’s newly proposed multi-domain test set challenge. For low-resource scenarios, we build a multi-language translation model to enhance the performance, and try to use the pre-trained language model (mBERT) to initialize the translation model.
2021
pdf
abs
The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
Chen Xu
|
Xiaoqian Liu
|
Xiaowen Liu
|
Tiger Wang
|
Canan Huang
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding. To augment the training data, the English transcriptions are translated to German translations. Finally, we employ ensemble decoding to integrate the predictions from several models trained with the different datasets. Combining these techniques, we achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.