TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task

Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, Qi Ju


Abstract
This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.
Anthology ID:
2021.wmt-1.45
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
376–382
Language:
URL:
https://aclanthology.org/2021.wmt-1.45
DOI:
Bibkey:
Cite (ACL):
Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, and Qi Ju. 2021. TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task. In Proceedings of the Sixth Conference on Machine Translation, pages 376–382, Online. Association for Computational Linguistics.
Cite (Informal):
TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task (Yang et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.wmt-1.45.pdf