A Multiscale Visualization of Attention in the Transformer Model

Jesse Vig

doi:10.18653/v1/P19-3007

A Multiscale Visualization of Attention in the Transformer Model

Abstract

The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model by showing how the model assigns weight to different input elements. However, the multi-layer, multi-head attention mechanism in the Transformer model can be difficult to decipher. To make the model more accessible, we introduce an open-source tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism. We demonstrate the tool on BERT and OpenAI GPT-2 and present three example use cases: detecting model bias, locating relevant attention heads, and linking neurons to model behavior.

Anthology ID:: P19-3007
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Marta R. Costa-jussà, Enrique Alfonseca
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 37–42
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/P19-3007/
DOI:: 10.18653/v1/P19-3007
Bibkey:
Cite (ACL):: Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 37–42, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: A Multiscale Visualization of Attention in the Transformer Model (Vig, ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/P19-3007.pdf
Poster:: P19-3007.Poster.pdf
Code: jessevig/bertviz + additional community code

PDF Cite Search Code Poster Fix data