Rule induction for global explanation of trained models

Madhumita Sushil; Simon Suster; Walter Daelemans

doi:10.18653/v1/W18-5411

Rule induction for global explanation of trained models

Madhumita Sushil, Simon Šuster, Walter Daelemans

Abstract

Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network’s performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first calculate the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80. We make the code available at https://github.com/clips/interpret_with_rules.

Anthology ID:: W18-5411
Volume:: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2018
Address:: Brussels, Belgium
Editors:: Tal Linzen, Grzegorz Chrupała, Afra Alishahi
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 82–97
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/W18-5411/
DOI:: 10.18653/v1/W18-5411
Bibkey:
Cite (ACL):: Madhumita Sushil, Simon Šuster, and Walter Daelemans. 2018. Rule induction for global explanation of trained models. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 82–97, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Rule induction for global explanation of trained models (Sushil et al., EMNLP 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/W18-5411.pdf

PDF Cite Search Fix data