Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models

Maria Teleki, Xiangjue Dong, Haoran Liu, James Caverlee


Abstract
Masculine discourse words are discourse terms that are both socially normative and statistically associated with male speakers. We propose a twofold framework for (i) the large-scale discovery and analysis of gendered discourse words in spoken content via our Gendered Discourse Correlation Framework; and (ii) the measurement of the gender bias associated with these words in LLMs via our Discourse Word-Embedding Association Test. We focus our study on podcasts, a popular and growing form of social media, analyzing 15,117 podcast episodes. We analyze correlations between gender and discourse words – discovered via LDA and BERTopic. We then find that gendered discourse-based masculine defaults exist in the domains of business, technology/politics, and video games, indicating that these gendered discourse words are socially influential. Next, we study the representation of these words from a state-of-the-art LLM embedding model from OpenAI, and find that the masculine discourse words have a more stable and robust representation than the feminine discourse words, which may result in better system performance on downstream tasks for men. Hence, men are rewarded for their discourse patterns with better system performance – and this embedding disparity constitutes a representational harm and a masculine default.
Anthology ID:
2025.sicon-1.7
Volume:
Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
James Hale, Brian Deuksin Kwon, Ritam Dutt
Venues:
SICon | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
90–96
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.sicon-1.7/
DOI:
Bibkey:
Cite (ACL):
Maria Teleki, Xiangjue Dong, Haoran Liu, and James Caverlee. 2025. Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models. In Proceedings of the Third Workshop on Social Influence in Conversations (SICon 2025), pages 90–96, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models (Teleki et al., SICon 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.sicon-1.7.pdf