What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation

Zuchao Li, Yiran Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao, Taro Watanabe


Abstract
Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance.
Anthology ID:
2022.findings-acl.39
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
459–471
Language:
URL:
https://aclanthology.org/2022.findings-acl.39
DOI:
10.18653/v1/2022.findings-acl.39
Bibkey:
Cite (ACL):
Zuchao Li, Yiran Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao, and Taro Watanabe. 2022. What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 459–471, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation (Li et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.findings-acl.39.pdf