Contrastive Explanations for Model Interpretability
Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg
Abstract
Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.- Anthology ID:
- 2021.emnlp-main.120
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1597–1611
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.120
- DOI:
- 10.18653/v1/2021.emnlp-main.120
- Cite (ACL):
- Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, and Yoav Goldberg. 2021. Contrastive Explanations for Model Interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1597–1611, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Contrastive Explanations for Model Interpretability (Jacovi et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.emnlp-main.120.pdf
- Code
- allenai/contrastive-explanations
- Data
- MultiNLI, SNLI