Abstract
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN) . RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.- Anthology ID:
- 2021.emnlp-main.258
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3216–3225
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.258
- DOI:
- 10.18653/v1/2021.emnlp-main.258
- Cite (ACL):
- Jiali Zeng, Shuangzhi Wu, Yongjing Yin, Yufan Jiang, and Mu Li. 2021. Recurrent Attention for Neural Machine Translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3216–3225, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Recurrent Attention for Neural Machine Translation (Zeng et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.emnlp-main.258.pdf
- Code
- lemon0830/ran