BELIEVE: Belief-Enhanced Instruction Generation and Augmentation for Zero-Shot Bias Mitigation

Lisa Bauer, Ninareh Mehrabi, Palash Goyal, Kai-Wei Chang, Aram Galstyan, Rahul Gupta


Abstract
Language models, pre-trained on large amounts of unmoderated content, have been shown to contain societal biases. Mitigating such biases typically requires access to model parameters and training schemas. In this work, we address bias mitigation at inference time, such that it can be applied to any black-box model. To this end, we propose a belief generation and augmentation framework, BELIEVE, that demonstrates effective bias mitigation for natural language generation by augmenting input prompts with automatically generated instruction-based beliefs. Our framework eases the bottleneck required for manually crafting these instruction-based beliefs, by extending a recently proposed iterative in-context learning framework to automatically generate beliefs via a language model. We assess the impact of this system on fairness, and demonstrate effective bias mitigation on pretrained and instruction-tuned models for both sentiment and regard with respect to multiple protected classes including race, gender, and political ideology.
Anthology ID:
2024.trustnlp-1.20
Volume:
Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Anaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
Venues:
TrustNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
239–251
Language:
URL:
https://aclanthology.org/2024.trustnlp-1.20
DOI:
10.18653/v1/2024.trustnlp-1.20
Bibkey:
Cite (ACL):
Lisa Bauer, Ninareh Mehrabi, Palash Goyal, Kai-Wei Chang, Aram Galstyan, and Rahul Gupta. 2024. BELIEVE: Belief-Enhanced Instruction Generation and Augmentation for Zero-Shot Bias Mitigation. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 239–251, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
BELIEVE: Belief-Enhanced Instruction Generation and Augmentation for Zero-Shot Bias Mitigation (Bauer et al., TrustNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.trustnlp-1.20.pdf