NJUNLP’s Participation for the WMT2022 Quality Estimation Shared Task

Xiang Geng, Yu Zhang, Shujian Huang, Shimin Tao, Hao Yang, Jiajun Chen


Abstract
This paper presents submissions of the NJUNLP team in WMT 2022Quality Estimation shared task 1, where the goal is to predict the sentence-level and word-level quality for target machine translations. Our system explores pseudo data and multi-task learning. We propose several novel methods to generate pseudo data for different annotations using the conditional masked language model and the neural machine translation model. The proposed methods control the decoding process to generate more real pseudo translations. We pre-train the XLMR-large model with pseudo data and then fine-tune this model with real data both in the way of multi-task learning. We jointly learn sentence-level scores (with regression and rank tasks) and word-level tags (with a sequence tagging task). Our system obtains competitive results on different language pairs and ranks first place on both sentence- and word-level sub-tasks of the English-German language pair.
Anthology ID:
2022.wmt-1.57
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
615–620
Language:
URL:
https://aclanthology.org/2022.wmt-1.57
DOI:
Bibkey:
Cite (ACL):
Xiang Geng, Yu Zhang, Shujian Huang, Shimin Tao, Hao Yang, and Jiajun Chen. 2022. NJUNLP’s Participation for the WMT2022 Quality Estimation Shared Task. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 615–620, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
NJUNLP’s Participation for the WMT2022 Quality Estimation Shared Task (Geng et al., WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.wmt-1.57.pdf