Abstract
Neural rationale models are popular for interpretable predictions of NLP tasks. In these, a selector extracts segments of the input text, called rationales, and passes these segments to a classifier for prediction. Since the rationale is the only information accessible to the classifier, it is plausibly defined as the explanation. Is such a characterization unconditionally correct? In this paper, we argue to the contrary, with both philosophical perspectives and empirical evidence suggesting that rationale models are, perhaps, less rational and interpretable than expected. We call for more rigorous evaluations of these models to ensure desired properties of interpretability are indeed achieved. The code for our experiments is at https://github.com/yimingz89/Neural-Rationale-Analysis.- Anthology ID:
- 2022.trustnlp-1.6
- Volume:
- Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, U.S.A.
- Editors:
- Apurv Verma, Yada Pruksachatkun, Kai-Wei Chang, Aram Galstyan, Jwala Dhamala, Yang Trista Cao
- Venue:
- TrustNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 64–73
- Language:
- URL:
- https://aclanthology.org/2022.trustnlp-1.6
- DOI:
- 10.18653/v1/2022.trustnlp-1.6
- Cite (ACL):
- Yiming Zheng, Serena Booth, Julie Shah, and Yilun Zhou. 2022. The Irrationality of Neural Rationale Models. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pages 64–73, Seattle, U.S.A.. Association for Computational Linguistics.
- Cite (Informal):
- The Irrationality of Neural Rationale Models (Zheng et al., TrustNLP 2022)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2022.trustnlp-1.6.pdf
- Code
- yimingz89/neural-rationale-analysis
- Data
- SST