TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task
Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, Qi Ju
Abstract
This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.- Anthology ID:
- 2021.wmt-1.45
- Volume:
- Proceedings of the Sixth Conference on Machine Translation
- Month:
- November
- Year:
- 2021
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 376–382
- Language:
- URL:
- https://aclanthology.org/2021.wmt-1.45
- DOI:
- Cite (ACL):
- Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, and Qi Ju. 2021. TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task. In Proceedings of the Sixth Conference on Machine Translation, pages 376–382, Online. Association for Computational Linguistics.
- Cite (Informal):
- TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task (Yang et al., WMT 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.wmt-1.45.pdf