How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

Tobias Domhan


Abstract
With recent advances in network architectures for Neural Machine Translation (NMT) recurrent models have effectively been replaced by either convolutional or self-attentional approaches, such as in the Transformer. While the main innovation of the Transformer architecture is its use of self-attentional layers, there are several other aspects, such as attention with multiple heads and the use of many attention layers, that distinguish the model from previous baselines. In this work we take a fine-grained look at the different architectures for NMT. We introduce an Architecture Definition Language (ADL) allowing for a flexible combination of common building blocks. Making use of this language we show in experiments that one can bring recurrent and convolutional models very close to the Transformer performance by borrowing concepts from the Transformer architecture, but not using self-attention. Additionally, we find that self-attention is much more important on the encoder side than on the decoder side, where it can be replaced by a RNN or CNN without a loss in performance in most settings. Surprisingly, even a model without any target side self-attention performs well.
Anthology ID:
P18-1167
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1799–1808
Language:
URL:
https://aclanthology.org/P18-1167
DOI:
10.18653/v1/P18-1167
Bibkey:
Cite (ACL):
Tobias Domhan. 2018. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1799–1808, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures (Domhan, ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/P18-1167.pdf
Poster:
 P18-1167.Poster.pdf
Code
 awslabs/sockeye