Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data

Laura Biester


Abstract
Large Language Models (LLMs) have been shown to be biased in prior work, as they generate text that is in line with stereotypical views of the world or that is not representative of the viewpoints and values of historically marginalized demographic groups. In this work, we propose using data from parallel men’s and women’s events at the Olympic Games to investigate different forms of gender bias in language models. We define three metrics to measure bias, and find that models are consistently biased against women when the gender is ambiguous in the prompt. In this case, the model frequently retrieves only the results of the men’s event with or without acknowledging them as such, revealing pervasive gender bias in LLMs in the context of athletics.
Anthology ID:
2025.naacl-short.17
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
195–205
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.17/
DOI:
Bibkey:
Cite (ACL):
Laura Biester. 2025. Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 195–205, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Sports and Women’s Sports: Gender Bias in Text Generation with Olympic Data (Biester, NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-short.17.pdf