No Domain Left behind

Hui Zeng


Abstract
We participated in the WMT General MT task and focus on four high resource language pairs: English to Chinese, Chinese to English, English to Japanese and Japanese to English). The submitted systems (LanguageX) focus on data cleaning, data selection, data mixing and TM-augmented NMT. Rules and multilingual language model are used for data filtering and data selection. In the automatic evaluation, our best submitted English to Chinese system achieved 54.3 BLEU score and 63.8 COMET score, which is the highest among all the submissions.
Anthology ID:
2022.wmt-1.38
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
423–427
Language:
URL:
https://aclanthology.org/2022.wmt-1.38
DOI:
Bibkey:
Cite (ACL):
Hui Zeng. 2022. No Domain Left behind. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 423–427, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
No Domain Left behind (Zeng, WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-url/2022.wmt-1.38.pdf