Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui


Abstract
Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. The findings of our norm-based analyses of BERT and a Transformer-based neural machine translation system include the following: (i) contrary to previous studies, BERT pays poor attention to special tokens, and (ii) reasonable word alignment can be extracted from attention mechanisms of Transformer. These findings provide insights into the inner workings of Transformers.
Anthology ID:
2020.emnlp-main.574
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7057–7075
Language:
URL:
https://aclanthology.org/2020.emnlp-main.574
DOI:
10.18653/v1/2020.emnlp-main.574
Bibkey:
Cite (ACL):
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, and Kentaro Inui. 2020. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7057–7075, Online. Association for Computational Linguistics.
Cite (Informal):
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (Kobayashi et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-url/2020.emnlp-main.574.pdf
Video:
 https://slideslive.com/38938738
Code
 gorokoba560/norm-analysis-of-transformer