Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu, Michael Lyu


Abstract
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.
Anthology ID:
2022.acl-long.185
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2591–2600
Language:
URL:
https://aclanthology.org/2022.acl-long.185
DOI:
10.18653/v1/2022.acl-long.185
Bibkey:
Cite (ACL):
Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu, and Michael Lyu. 2022. Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2591–2600, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation (Wang et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2022.acl-long.185.pdf