Evaluating Gender Bias of LLMs in Making Morality Judgements

Divij Bajaj; Yuanyuan Lei; Jonathan Tong; Ruihong Huang

doi:10.18653/v1/2024.findings-emnlp.928

Evaluating Gender Bias of LLMs in Making Morality Judgements

Divij Bajaj, Yuanyuan Lei, Jonathan Tong, Ruihong Huang

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in a multitude of Natural Language Processing (NLP) tasks. However, these models are still not immune to limitations such as social biases, especially gender bias. This work investigates whether current closed and open-source LLMs possess gender bias, especially when asked to give moral opinions. To evaluate these models, we curate and introduce a new dataset GenMO (Gender-bias in Morality Opinions) comprising parallel short stories featuring male and female characters respectively. Specifically, we test models from the GPT family (GPT-3.5-turbo, GPT-3.5-turbo-instruct, GPT-4-turbo), Llama 3 and 3.1 families (8B/70B), Mistral-7B and Claude 3 families (Sonnet and Opus). Surprisingly, despite employing safety checks, all production-standard models we tested display significant gender bias with GPT-3.5-turbo giving biased opinions in 24% of the samples. Additionally, all models consistently favour female characters, with GPT showing bias in 68-85% of cases and Llama 3 in around 81-85% instances. Additionally, our study investigates the impact of model parameters on gender bias and explores real-world situations where LLMs reveal biases in moral decision-making.

Anthology ID:: 2024.findings-emnlp.928
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15804–15818
Language:
URL:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.findings-emnlp.928/
DOI:: 10.18653/v1/2024.findings-emnlp.928
Bibkey:
Cite (ACL):: Divij Bajaj, Yuanyuan Lei, Jonathan Tong, and Ruihong Huang. 2024. Evaluating Gender Bias of LLMs in Making Morality Judgements. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 15804–15818, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Evaluating Gender Bias of LLMs in Making Morality Judgements (Bajaj et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.findings-emnlp.928.pdf
Data:: 2024.findings-emnlp.928.data.zip

PDF Cite Search Data Fix data