Contrastive Explanations for Model Interpretability

Alon Jacovi; Swabha Swayamdipta; Shauli Ravfogel; Yanai Elazar; Yejin Choi; Yoav Goldberg

doi:10.18653/v1/2021.emnlp-main.120

Contrastive Explanations for Model Interpretability

Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg

Abstract

Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

Anthology ID:: 2021.emnlp-main.120
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1597–1611
Language:
URL:: https://aclanthology.org/2021.emnlp-main.120
DOI:: 10.18653/v1/2021.emnlp-main.120
Bibkey:
Cite (ACL):: Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, and Yoav Goldberg. 2021. Contrastive Explanations for Model Interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1597–1611, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Contrastive Explanations for Model Interpretability (Jacovi et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-2024-clasp/2021.emnlp-main.120.pdf
Video:: https://preview.aclanthology.org/ingest-2024-clasp/2021.emnlp-main.120.mp4
Code: allenai/contrastive-explanations
Data: MultiNLI, SNLI

PDF Search Code Video