Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias

Shucheng Zhu, Bingjie Du, Jishun Zhao, Ying Liu, Pengyuan Liu


Abstract
Pre-trained language models (PLMs) have achieved success in various of natural language processing (NLP) tasks. However, PLMs also introduce some disquieting safety problems, such as gender bias. Gender bias is an extremely complex issue, because different individuals may hold disparate opinions on whether the same sentence expresses harmful bias, especially those seemingly neutral or positive. This paper first defines the concept of contextualized gender bias (CGB), which makes it easy to measure implicit gender bias in both PLMs and annotators. We then construct CGBDataset, which contains 20k natural sentences with gendered words, from Chinese news. Similar to the task of masked language models, gendered words are masked for PLMs and annotators to judge whether a male word or a female word is more suitable. Then, we introduce CGBFrame to measure the gender bias of annotators. By comparing the results measured by PLMs and annotators, we find that though there are differences on the choices made by PLMs and annotators, they show significant consistency in general.
Anthology ID:
2024.gebnlp-1.2
Volume:
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–32
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2024.gebnlp-1.2/
DOI:
10.18653/v1/2024.gebnlp-1.2
Bibkey:
Cite (ACL):
Shucheng Zhu, Bingjie Du, Jishun Zhao, Ying Liu, and Pengyuan Liu. 2024. Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias. In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 20–32, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Do PLMs and Annotators Share the Same Gender Bias? Definition, Dataset, and Framework of Contextualized Gender Bias (Zhu et al., GeBNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2024.gebnlp-1.2.pdf