Bridging the Gap between Position-Based and Content-Based Self-Attention for Neural Machine Translation

Felix Schmidt, Mattia Di Gangi


Abstract
Position-based token-mixing approaches, such as FNet and MLPMixer, have shown to be exciting attention alternatives for computer vision and natural language understanding. The motivation is usually to remove redundant operations for higher efficiency on consumer GPUs while maintaining Transformer quality. On the hardware side, research on memristive crossbar arrays shows the possibility of efficiency gains up to two orders of magnitude by performing in-memory computation with weights stored on device. While it is impossible to store dynamic attention weights based on token-token interactions on device, position-based weights represent a concrete alternative if they only lead to minimal degradation. In this paper, we propose position-based attention as a variant of multi-head attention where the attention weights are computed from position representations. A naive replacement of token vectors with position vectors in self-attention results in a significant loss in translation quality, which can be recovered by using relative position representations and a gating mechanism. We show analytically that this gating mechanism introduces some form of word dependency and validate its effectiveness experimentally under various conditions. The resulting network, rPosNet, outperforms previous position-based approaches and matches the quality of the Transformer with relative position embedding while requiring 20% less attention parameters after training.
Anthology ID:
2023.wmt-1.46
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
507–521
Language:
URL:
https://aclanthology.org/2023.wmt-1.46
DOI:
10.18653/v1/2023.wmt-1.46
Bibkey:
Cite (ACL):
Felix Schmidt and Mattia Di Gangi. 2023. Bridging the Gap between Position-Based and Content-Based Self-Attention for Neural Machine Translation. In Proceedings of the Eighth Conference on Machine Translation, pages 507–521, Singapore. Association for Computational Linguistics.
Cite (Informal):
Bridging the Gap between Position-Based and Content-Based Self-Attention for Neural Machine Translation (Schmidt & Di Gangi, WMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.wmt-1.46.pdf