More Identifiable yet Equally Performant Transformers for Text Classification

Rishabh Bhardwaj, Navonil Majumder, Soujanya Poria, Eduard Hovy


Abstract
Interpretability is an important aspect of the trustworthiness of a model’s predictions. Transformer’s predictions are widely explained by the attention weights, i.e., a probability distribution generated at its self-attention unit (head). Current empirical studies provide shreds of evidence that attention weights are not explanations by proving that they are not unique. A recent study showed theoretical justifications to this observation by proving the non-identifiability of attention weights. For a given input to a head and its output, if the attention weights generated in it are unique, we call the weights identifiable. In this work, we provide deeper theoretical analysis and empirical observations on the identifiability of attention weights. Ignored in the previous works, we find the attention weights are more identifiable than we currently perceive by uncovering the hidden role of the key vector. However, the weights are still prone to be non-unique attentions that make them unfit for interpretation. To tackle this issue, we provide a variant of the encoder layer that decouples the relationship between key and value vector and provides identifiable weights up to the desired length of the input. We prove the applicability of such variations by providing empirical justifications on varied text classification tasks. The implementations are available at https://github.com/declare-lab/identifiable-transformers.
Anthology ID:
2021.acl-long.94
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1172–1182
Language:
URL:
https://aclanthology.org/2021.acl-long.94
DOI:
10.18653/v1/2021.acl-long.94
Bibkey:
Cite (ACL):
Rishabh Bhardwaj, Navonil Majumder, Soujanya Poria, and Eduard Hovy. 2021. More Identifiable yet Equally Performant Transformers for Text Classification. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1172–1182, Online. Association for Computational Linguistics.
Cite (Informal):
More Identifiable yet Equally Performant Transformers for Text Classification (Bhardwaj et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/2021.acl-long.94.pdf
Video:
 https://preview.aclanthology.org/starsem-semeval-split/2021.acl-long.94.mp4
Code
 declare-lab/identifiable-transformers
Data
AG NewsIMDb Movie ReviewsSNLISST