BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan; Ruizhe Chen; Ruiling Xu; Zuozhu Liu

doi:10.18653/v1/2024.emnlp-main.820

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Zhiting Fan, Ruizhe Chen, Ruiling Xu, Zuozhu Liu

Abstract

Evaluating the bias of LLMs becomes more crucial with their rapid development. However, existing evaluation approaches rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with its inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT-4-as-Judge in detecting bias. Furthermore, through application studies, we showcase the utility of BiasAlert in reliable LLM fairness evaluation and bias mitigation across various scenarios. Model and code will be publicly released.

Anthology ID:: 2024.emnlp-main.820
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14778–14790
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.820/
DOI:: 10.18653/v1/2024.emnlp-main.820
Bibkey:
Cite (ACL):: Zhiting Fan, Ruizhe Chen, Ruiling Xu, and Zuozhu Liu. 2024. BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14778–14790, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs (Fan et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2024.emnlp-main.820.pdf

PDF Search Fix data