The VolcTrans System for WMT22 Multilingual Machine Translation Task
Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang
Abstract
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformer-based multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. Both bilingual and monolingual texts are cleaned by a series of heuristic rules. On the official test set, our system achieves $17.3$ BLEU, $21.9$ spBLEU, and $41.9$ chrF2++ on average over all language pairs. Averaged inference speed is $11.5$ sentences per second using a single Nvidia Tesla V100 GPU.- Anthology ID:
- 2022.wmt-1.104
- Volume:
- Proceedings of the Seventh Conference on Machine Translation (WMT)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1068–1075
- Language:
- URL:
- https://aclanthology.org/2022.wmt-1.104
- DOI:
- Cite (ACL):
- Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, and Mingxuan Wang. 2022. The VolcTrans System for WMT22 Multilingual Machine Translation Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1068–1075, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- The VolcTrans System for WMT22 Multilingual Machine Translation Task (Qian et al., WMT 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.wmt-1.104.pdf