Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task
Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, Shuming Shi
Abstract
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German. Our neural machine translation (NMT) systems are built on sentence-level, document-level, non-autoregressive (NAT) and pretrained models. We integrate a number of advanced techniques into our systems, including data selection, back/forward translation, larger batch learning, model ensemble, finetuning as well as system combination. Specifically, we proposed a hybrid data selection method to select high-quality and in-domain sentences from out-of-domain data. To better capture the source contexts, we exploit to augment NAT models with evolved cross-attention. Furthermore, we explore to transfer general knowledge from four different pre-training language models to the downstream translation task. In general, we present extensive experimental results for this new translation task. Among all the participants, our German-to-English primary system is ranked the second in terms of BLEU scores.- Anthology ID:
- 2020.wmt-1.60
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 483–491
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.60
- DOI:
- Cite (ACL):
- Longyue Wang, Zhaopeng Tu, Xing Wang, Li Ding, Liang Ding, and Shuming Shi. 2020. Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 483–491, Online. Association for Computational Linguistics.
- Cite (Informal):
- Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task (Wang et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.60.pdf