Controlling Bias Exposure for Fair Interpretable Predictions

Zexue He, Yu Wang, Julian McAuley, Bodhisattwa Prasad Majumder


Abstract
Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information ‘fairly’, rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjustingthe predictive model’s belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.
Anthology ID:
2022.findings-emnlp.431
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5854–5866
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.431/
DOI:
10.18653/v1/2022.findings-emnlp.431
Bibkey:
Cite (ACL):
Zexue He, Yu Wang, Julian McAuley, and Bodhisattwa Prasad Majumder. 2022. Controlling Bias Exposure for Fair Interpretable Predictions. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5854–5866, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Controlling Bias Exposure for Fair Interpretable Predictions (He et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.431.pdf
Video:
 https://preview.aclanthology.org/icon-24-ingestion/2022.findings-emnlp.431.mp4