Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Yuchen Wen; Keping Bi; Wei Chen; Jiafeng Guo (嘉丰 郭); Xueqi Cheng

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng

Abstract

As large language models (LLMs) become an important way of information access, there have been increasing concerns that LLMs may intensify the spread of unethical content, including implicit bias that hurts certain populations without explicit harmful words. In this paper, we conduct a rigorous evaluation of LLMs’ implicit bias towards certain demographics by attacking them from a psychometric perspective to elicit agreements to biased viewpoints. Inspired by psychometric principles in cognitive and social psychology, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Incorporating the corresponding attack instructions, we built two benchmarks: (1) a bilingual dataset with biased statements covering four bias types (2.7K instances) for extensive comparative analysis, and (2) BUMBLE, a larger benchmark spanning nine common bias types (12.7K instances) for comprehensive evaluation. Extensive evaluation of popular commercial and open-source LLMs shows that our methods can elicit LLMs’ inner bias more effectively than competitive baselines. Our attack methodology and benchmarks offer an effective means of assessing the ethical risks of LLMs, driving progress toward greater accountability in their development.

Anthology ID:: 2025.findings-acl.263
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5081–5097
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.263/
DOI:
Bibkey:
Cite (ACL):: Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, and Xueqi Cheng. 2025. Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5081–5097, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective (Wen et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.263.pdf

PDF Cite Search Fix data