The VolcTrans System for WMT22 Multilingual Machine Translation Task

Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang


Abstract
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformer-based multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. Both bilingual and monolingual texts are cleaned by a series of heuristic rules. On the official test set, our system achieves $17.3$ BLEU, $21.9$ spBLEU, and $41.9$ chrF2++ on average over all language pairs. Averaged inference speed is $11.5$ sentences per second using a single Nvidia Tesla V100 GPU.
Anthology ID:
2022.wmt-1.104
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1068–1075
Language:
URL:
https://aclanthology.org/2022.wmt-1.104
DOI:
Bibkey:
Cite (ACL):
Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, and Mingxuan Wang. 2022. The VolcTrans System for WMT22 Multilingual Machine Translation Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1068–1075, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
The VolcTrans System for WMT22 Multilingual Machine Translation Task (Qian et al., WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wmt-1.104.pdf
Dataset:
 2022.wmt-1.104.dataset.zip