Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
Yusuke Hirota, Jerone Andrews, Dora Zhao, Orestis Papakyriakopoulos, Apostolos Modas, Yuta Nakashima, Alice Xiang
Abstract
We tackle societal bias in image-text datasets by removing spurious correlations between protected groups and image attributes. Traditional methods only target labeled attributes, ignoring biases from unlabeled ones. Using text-guided inpainting models, our approach ensures protected group independence from all attributes and mitigates inpainting biases through data filtering. Evaluations on multi-label image classification and image captioning tasks show our method effectively reduces bias without compromising performance across various models. Specifically, we achieve an average societal bias reduction of 46.1% in leakage-based bias metrics for multi-label classification and 74.8% for image captioning.- Anthology ID:
- 2024.emnlp-main.471
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8249–8267
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.471/
- DOI:
- 10.18653/v1/2024.emnlp-main.471
- Cite (ACL):
- Yusuke Hirota, Jerone Andrews, Dora Zhao, Orestis Papakyriakopoulos, Apostolos Modas, Yuta Nakashima, and Alice Xiang. 2024. Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8249–8267, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes (Hirota et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.471.pdf