Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task

Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi


Abstract
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German. Our neural machine translation (NMT) systems are built on sentence-level, document-level, non-autoregressive (NAT) and pretrained models. We integrate a number of advanced techniques into our systems, including data selection, back/forward translation, larger batch learning, model ensemble, finetuning as well as system combination. Specifically, we proposed a hybrid data selection method to select high-quality and in-domain sentences from out-of-domain data. To better capture the source contexts, we exploit to augment NAT models with evolved cross-attention. Furthermore, we explore to transfer general knowledge from four different pre-training language models to the downstream translation task. In general, we present extensive experimental results for this new translation task. Among all the participants, our German-to-English primary system is ranked the second in terms of BLEU scores.
Anthology ID:
2020.wmt-1.60
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
483–491
Language:
URL:
https://aclanthology.org/2020.wmt-1.60
DOI:
Bibkey:
Cite (ACL):
Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, and Shuming Shi. 2020. Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 483–491, Online. Association for Computational Linguistics.
Cite (Informal):
Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task (Wang et al., WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.60.pdf
Video:
 https://slideslive.com/38939671