Proposed Taxonomy for Gender Bias in Text; A Filtering Methodology for the Gender Generalization Subtype

Yasmeen Hitti, Eunbee Jang, Ines Moreno, Carolyne Pelletier


Abstract
The purpose of this paper is to present an empirical study on gender bias in text. Current research in this field is focused on detecting and correcting for gender bias in existing machine learning models rather than approaching the issue at the dataset level. The underlying motivation is to create a dataset which could enable machines to learn to differentiate bias writing from non-bias writing. A taxonomy is proposed for structural and contextual gender biases which can manifest themselves in text. A methodology is proposed to fetch one type of structural gender bias, Gender Generalization. We explore the IMDB movie review dataset and 9 different corpora from Project Gutenberg. By filtering out irrelevant sentences, the remaining pool of candidate sentences are sent for human validation. A total of 6123 judgments are made on 1627 sentences and after a quality check on randomly selected sentences we obtain an accuracy of 75%. Out of the 1627 sentences, 808 sentence were labeled as Gender Generalizations. The inter-rater reliability amongst labelers was of 61.14%.
Anthology ID:
W19-3802
Volume:
Proceedings of the First Workshop on Gender Bias in Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–17
Language:
URL:
https://aclanthology.org/W19-3802
DOI:
10.18653/v1/W19-3802
Bibkey:
Cite (ACL):
Yasmeen Hitti, Eunbee Jang, Ines Moreno, and Carolyne Pelletier. 2019. Proposed Taxonomy for Gender Bias in Text; A Filtering Methodology for the Gender Generalization Subtype. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 8–17, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Proposed Taxonomy for Gender Bias in Text; A Filtering Methodology for the Gender Generalization Subtype (Hitti et al., GeBNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/W19-3802.pdf
Data
GAP Coreference DatasetIMDb Movie Reviews