The VolcTrans System for WMT22 Multilingual Machine Translation Task
Xian Qian | Kai Hu | Jiaqiang Wang | Yifeng Liu | Xingyuan Pan | Jun Cao | Mingxuan Wang
Proceedings of the Seventh Conference on Machine Translation (WMT)
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformer-based multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. Both bilingual and monolingual texts are cleaned by a series of heuristic rules. On the official test set, our system achieves $17.3$ BLEU, $21.9$ spBLEU, and $41.9$ chrF2++ on average over all language pairs. Averaged inference speed is $11.5$ sentences per second using a single Nvidia Tesla V100 GPU.