Abstract
Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.- Anthology ID:
- W18-5416
- Volume:
- Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
- Month:
- November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Tal Linzen, Grzegorz Chrupała, Afra Alishahi
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 136–144
- Language:
- URL:
- https://aclanthology.org/W18-5416
- DOI:
- 10.18653/v1/W18-5416
- Cite (ACL):
- Eric Wallace, Shi Feng, and Jordan Boyd-Graber. 2018. Interpreting Neural Networks with Nearest Neighbors. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 136–144, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Interpreting Neural Networks with Nearest Neighbors (Wallace et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/W18-5416.pdf
- Data
- MPQA Opinion Corpus, SNLI, SST