Papago’s Submission for the WMT21 Quality Estimation Shared Task

Seunghyun Lim, Hantae Kim, Hyunjoong Kim


Abstract
This paper describes Papago submission to the WMT 2021 Quality Estimation Task 1: Sentence-level Direct Assessment. Our multilingual Quality Estimation system explores the combination of Pretrained Language Models and Multi-task Learning architectures. We propose an iterative training pipeline based on pretraining with large amounts of in-domain synthetic data and finetuning with gold (labeled) data. We then compress our system via knowledge distillation in order to reduce parameters yet maintain strong performance. Our submitted multilingual systems perform competitively in multilingual and all 11 individual language pair settings including zero-shot.
Anthology ID:
2021.wmt-1.98
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
935–940
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.wmt-1.98/
DOI:
Bibkey:
Cite (ACL):
Seunghyun Lim, Hantae Kim, and Hyunjoong Kim. 2021. Papago’s Submission for the WMT21 Quality Estimation Shared Task. In Proceedings of the Sixth Conference on Machine Translation, pages 935–940, Online. Association for Computational Linguistics.
Cite (Informal):
Papago’s Submission for the WMT21 Quality Estimation Shared Task (Lim et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.wmt-1.98.pdf