On the Sub-layer Functionalities of Transformer Decoder
Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, Zhaopeng Tu
Abstract
There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.- Anthology ID:
- 2020.findings-emnlp.432
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2020
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Trevor Cohn, Yulan He, Yang Liu
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4799–4811
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.432/
- DOI:
- 10.18653/v1/2020.findings-emnlp.432
- Cite (ACL):
- Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, and Zhaopeng Tu. 2020. On the Sub-layer Functionalities of Transformer Decoder. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4799–4811, Online. Association for Computational Linguistics.
- Cite (Informal):
- On the Sub-layer Functionalities of Transformer Decoder (Yang et al., Findings 2020)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2020.findings-emnlp.432.pdf