Can Explanations Be Useful for Calibrating Black Box Models?

Xi Ye, Greg Durrett


Abstract
NLP practitioners often want to take existing trained models and apply them to data from new domains. While fine-tuning or few-shot learning can be used to adapt a base model, there is no single recipe for making these techniques work; moreover, one may not have access to the original model weights if it is deployed as a black box. We study how to improve a black box model’s performance on a new domain by leveraging explanations of the model’s behavior. Our approach first extracts a set of features combining human intuition about the task with model attributions generated by black box interpretation techniques, then uses a simple calibrator, in the form of a classifier, to predict whether the base model was correct or not. We experiment with our method on two tasks, extractive question answering and natural language inference, covering adaptation from several pairs of domains with limited target-domain data. The experimental results across all the domain pairs show that explanations are useful for calibrating these models, boosting accuracy when predictions do not have to be returned on every example. We further show that the calibration model transfers to some extent between tasks.
Anthology ID:
2022.acl-long.429
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6199–6212
Language:
URL:
https://aclanthology.org/2022.acl-long.429
DOI:
10.18653/v1/2022.acl-long.429
Bibkey:
Cite (ACL):
Xi Ye and Greg Durrett. 2022. Can Explanations Be Useful for Calibrating Black Box Models?. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6199–6212, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Can Explanations Be Useful for Calibrating Black Box Models? (Ye & Durrett, ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2022.acl-long.429.pdf
Software:
 2022.acl-long.429.software.zip
Video:
 https://preview.aclanthology.org/ingest-acl-2023-videos/2022.acl-long.429.mp4
Code
 xiye17/interpcalib +  additional community code
Data
GLUEMRPCMultiNLIQNLISQuAD