On the Sub-layer Functionalities of Transformer Decoder

Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, Zhaopeng Tu


Abstract
There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.
Anthology ID:
2020.findings-emnlp.432
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4799–4811
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.432/
DOI:
10.18653/v1/2020.findings-emnlp.432
Bibkey:
Cite (ACL):
Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, and Zhaopeng Tu. 2020. On the Sub-layer Functionalities of Transformer Decoder. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4799–4811, Online. Association for Computational Linguistics.
Cite (Informal):
On the Sub-layer Functionalities of Transformer Decoder (Yang et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.432.pdf
Video:
 https://slideslive.com/38940141