Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Xiaochuang Han; Yulia Tsvetkov

doi:10.18653/v1/2021.findings-emnlp.374

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Abstract

Among the most critical limitations of deep learning NLP models are their lack of interpretability, and their reliance on spurious correlations. Prior work proposed various approaches to interpreting the black-box models to unveil the spurious correlations, but the research was primarily used in human-computer interaction scenarios. It still remains underexplored whether or how such model interpretations can be used to automatically “unlearn” confounding features. In this work, we propose influence tuning—a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data, significantly outperforming baseline methods that use adversarial training.

Anthology ID:: 2021.findings-emnlp.374
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4398–4409
Language:
URL:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2021.findings-emnlp.374/
DOI:: 10.18653/v1/2021.findings-emnlp.374
Bibkey:
Cite (ACL):: Xiaochuang Han and Yulia Tsvetkov. 2021. Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4398–4409, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates (Han & Tsvetkov, Findings 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2021.findings-emnlp.374.pdf
Video:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2021.findings-emnlp.374.mp4
Code: xhan77/influence-tuning

PDF Cite Search Code Video Fix data