Haoran Sun
2023
CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation
Yikun Lei
|
Zhengshan Xue
|
Xiaohu Zhao
|
Haoran Sun
|
Shaolin Zhu
|
Xiaodong Lin
|
Deyi Xiong
Findings of the Association for Computational Linguistics: ACL 2023
Distilling knowledge from a high-resource task, e.g., machine translation, is an effective way to alleviate the data scarcity problem of end-to-end speech translation.However, previous works simply use the classical knowledge distillation that does not allow for adequate transfer of knowledge from machine translation.In this paper, we propose a comprehensive knowledge distillation framework for speech translation, CKDST, which is capable of comprehensively and effectively distilling knowledge from machine translation to speech translation from two perspectives: cross-modal contrastive representation distillation and simultaneous decoupled knowledge distillation. In the former, we leverage a contrastive learning objective to optmize the mutual information between speech and text representations for representation distillation in the encoder. In the later, we decouple the non-target class knowledge from target class knowledge for logits distillation in the decoder.Experiments on the MuST-C benchmark dataset demonstrate that our CKDST substantially improves the baseline by 1.2 BLEU on average in all translation directions, and outperforms previous state-of-the-art end-to-end and cascaded speech translation models.
2022
Language Branch Gated Multilingual Neural Machine Translation
Haoran Sun
|
Deyi Xiong
Proceedings of the 29th International Conference on Computational Linguistics
Knowledge transfer across languages is crucial for multilingual neural machine translation. In this paper, we propose language branch (LB) gated multilingual neural machine translation that encourages knowledge transfer within the same language branch with a LB-gated module that is integrated into both the encoder and decoder. The LB-gated module distinguishes LB-specific parameters from global parameters shared by all languages and routes languages from the same LB to the corresponding LB-specific network. Comprehensive experiments on the OPUS-100 dataset show that the proposed approach substantially improves translation quality on both middle- and low-resource languages over previous methods. Further analysis demonstrates its ability in learning similarities between language branches.
Search
Co-authors
- Deyi Xiong 2
- Yikun Lei 1
- Zhengshan Xue 1
- Xiaohu Zhao 1
- Shaolin Zhu 1
- show all...