Interpretable Neural Predictions with Differentiable Binary Variables

Jasmijn Bastings; Wilker Aziz; Ivan Titov

doi:10.18653/v1/P19-1284

Interpretable Neural Predictions with Differentiable Binary Variables

Jasmijn Bastings, Wilker Aziz, Ivan Titov

Abstract

The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification–a rationale–for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.

Anthology ID:: P19-1284
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2963–2977
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/P19-1284/
DOI:: 10.18653/v1/P19-1284
Bibkey:
Cite (ACL):: Jasmijn Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable Neural Predictions with Differentiable Binary Variables. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2963–2977, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Interpretable Neural Predictions with Differentiable Binary Variables (Bastings et al., ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/P19-1284.pdf
Code: bastings/interpretable_predictions
Data: SNLI, SST

PDF Cite Search Code Fix data