Rethinking Prompt-based Debiasing in Large Language Model

Xinyi Yang; Runzhe Zhan (詹润哲); Shu Yang; Junchao Wu; Lidia S. Chao; Derek F. Wong (黄辉)

Rethinking Prompt-based Debiasing in Large Language Model

Xinyi Yang, Runzhe Zhan, Shu Yang, Junchao Wu, Lidia S. Chao, Derek F. Wong

Abstract

Investigating bias in large language models (LLMs) is crucial for developing trustworthy AI. While prompt-based through prompt engineering is common, its effectiveness relies on the assumption that models inherently understand biases. Our study systematically analyzed this assumption using the BBQ and StereoSet benchmarks on both open-source models as well as commercial GPT model. Experimental results indicate that prompt-based is often superficial; for instance, the Llama2-7B-Chat model misclassified over 90% of unbiased content as biased, despite achieving high accuracy in identifying bias issues on the BBQ dataset. Additionally, specific evaluation and question settings in bias benchmarks often lead LLMs to choose “evasive answers”, disregarding the core of the question and the relevance of the response to the context. Moreover, the apparent success of previous methods may stem from flawed evaluation metrics. Our research highlights a potential “false prosperity” in prompt-base efforts and emphasizes the need to rethink bias evaluation metrics to ensure truly trustworthy AI. We will release our data and code upon acceptance.

Anthology ID:: 2025.findings-acl.1361
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26538–26553
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1361/
DOI:
Bibkey:
Cite (ACL):: Xinyi Yang, Runzhe Zhan, Shu Yang, Junchao Wu, Lidia S. Chao, and Derek F. Wong. 2025. Rethinking Prompt-based Debiasing in Large Language Model. In Findings of the Association for Computational Linguistics: ACL 2025, pages 26538–26553, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Rethinking Prompt-based Debiasing in Large Language Model (Yang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1361.pdf

PDF Cite Search Fix data