Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text

Lucy Havens, Melissa Terras, Benjamin Bach, Beatrice Alex


Abstract
Mitigating harms from gender biased language in Natural Language Processing (NLP) systems remains a challenge, and the situated nature of language means bias is inescapable in NLP data. Though efforts to mitigate gender bias in NLP are numerous, they often vaguely define gender and bias, only consider two genders, and do not incorporate uncertainty into models. To address these limitations, in this paper we present a taxonomy of gender biased language and apply it to create annotated datasets. We created the taxonomy and annotated data with the aim of making gender bias in language transparent. If biases are communicated clearly, varieties of biased language can be better identified and measured. Our taxonomy contains eleven types of gender biases inclusive of people whose gender expressions do not fit into the binary conceptions of woman and man, and whose gender differs from that they were assigned at birth, while also allowing annotators to document unknown gender information. The taxonomy and annotated data will, in future work, underpin analysis and more equitable language model development.
Anthology ID:
2022.gebnlp-1.4
Original:
2022.gebnlp-1.4v1
Version 2:
2022.gebnlp-1.4v2
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–57
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.4
DOI:
10.18653/v1/2022.gebnlp-1.4
Bibkey:
Cite (ACL):
Lucy Havens, Melissa Terras, Benjamin Bach, and Beatrice Alex. 2022. Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 30–57, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text (Havens et al., GeBNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2022.gebnlp-1.4.pdf