An Empirical Comparison of Instance Attribution Methods for NLP
Pouya Pezeshkpour, Sarthak Jain, Byron Wallace, Sameer Singh
Abstract
Widespread adoption of deep models has motivated a pressing need for approaches to interpret network outputs and to facilitate model debugging. Instance attribution methods constitute one means of accomplishing these goals by retrieving training instances that (may have) led to a particular prediction. Influence functions (IF; Koh and Liang 2017) provide machinery for doing this by quantifying the effect that perturbing individual train instances would have on a specific test prediction. However, even approximating the IF is computationally expensive, to the degree that may be prohibitive in many cases. Might simpler approaches (e.g., retrieving train examples most similar to a given test point) perform comparably? In this work, we evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods (such as IFs), but that nonetheless exhibit desirable characteristics similar to more complex attribution methods. Code for all methods and experiments in this paper is available at: https://github.com/successar/instance_attributions_NLP.- Anthology ID:
- 2021.naacl-main.75
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 967–975
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.75
- DOI:
- 10.18653/v1/2021.naacl-main.75
- Cite (ACL):
- Pouya Pezeshkpour, Sarthak Jain, Byron Wallace, and Sameer Singh. 2021. An Empirical Comparison of Instance Attribution Methods for NLP. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 967–975, Online. Association for Computational Linguistics.
- Cite (Informal):
- An Empirical Comparison of Instance Attribution Methods for NLP (Pezeshkpour et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2021.naacl-main.75.pdf
- Code
- successar/instance_attributions_NLP
- Data
- MultiNLI, SST, SST-2