TenTrans Large-Scale Multilingual Machine Translation System for WMT21

Wanying Xie, Bojie Hu, Han Yang, Dong Yu, Qi Ju


Abstract
This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly utilized forward/back-translation, in-domain data selection, knowledge distillation, and gradual fine-tuning from the pre-trained model FLORES-101. We find that forward/back-translation significantly improves the translation results, data selection and gradual fine-tuning are particularly effective during adapting domain, while knowledge distillation brings slight performance improvement. Also, model averaging is used to further improve the translation performance based on these systems. Our final system achieves an average BLEU score of 28.89 across thirty directions on the test set.
Anthology ID:
2021.wmt-1.53
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
439–445
Language:
URL:
https://aclanthology.org/2021.wmt-1.53
DOI:
Bibkey:
Cite (ACL):
Wanying Xie, Bojie Hu, Han Yang, Dong Yu, and Qi Ju. 2021. TenTrans Large-Scale Multilingual Machine Translation System for WMT21. In Proceedings of the Sixth Conference on Machine Translation, pages 439–445, Online. Association for Computational Linguistics.
Cite (Informal):
TenTrans Large-Scale Multilingual Machine Translation System for WMT21 (Xie et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.wmt-1.53.pdf
Data
FLORES-101