Abstract
Potential gender biases existing in Wikipedia’s content can contribute to biased behaviors in a variety of downstream NLP systems. Yet, efforts in understanding what inequalities in portraying women and men occur in Wikipedia focused so far only on *biographies*, leaving open the question of how often such harmful patterns occur in other topics. In this paper, we investigate gender-related asymmetries in Wikipedia titles from *all domains*. We assess that for only half of gender-related articles, i.e., articles with words such as *women* or *male* in their titles, symmetrical counterparts describing the same concept for the other gender (and clearly stating it in their titles) exist. Among the remaining imbalanced cases, the vast majority of articles concern sports- and social-related issues. We provide insights on how such asymmetries can influence other Wikipedia components and propose steps towards reducing the frequency of observed patterns.- Anthology ID:
- 2021.gebnlp-1.9
- Volume:
- Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Marta Costa-jussa, Hila Gonen, Christian Hardmeier, Kellie Webster
- Venue:
- GeBNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 75–85
- Language:
- URL:
- https://aclanthology.org/2021.gebnlp-1.9
- DOI:
- 10.18653/v1/2021.gebnlp-1.9
- Cite (ACL):
- Agnieszka Falenska and Özlem Çetinoğlu. 2021. Assessing Gender Bias in Wikipedia: Inequalities in Article Titles. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing, pages 75–85, Online. Association for Computational Linguistics.
- Cite (Informal):
- Assessing Gender Bias in Wikipedia: Inequalities in Article Titles (Falenska & Çetinoğlu, GeBNLP 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.gebnlp-1.9.pdf
- Code
- agnieszkafalenska/gebnlp2021
- Data
- GAP Coreference Dataset