Gender Bias in Instruction-Guided Speech Synthesis Models

Chun-Yi Kuan, Hung-yi Lee


Abstract
Recent advancements in controllable expressive speech synthesis, especially in text-to-speech (TTS) models, have allowed for the generation of speech with specific styles guided by textual descriptions, known as style prompts. While this development enhances the flexibility and naturalness of synthesized speech, there remains a significant gap in understanding how these models handle vague or abstract style prompts. This study investigates the potential gender bias in how models interpret occupation-related prompts, specifically examining their responses to instructions like “Act like a nurse”. We explore whether these models exhibit tendencies to amplify gender stereotypes when interpreting such prompts. Our experimental results reveal the model’s tendency to exhibit gender bias for certain occupations. Moreover, models of different sizes show varying degrees of this bias across these occupations.
Anthology ID:
2025.findings-naacl.298
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5387–5413
Language:
URL:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.findings-naacl.298/
DOI:
Bibkey:
Cite (ACL):
Chun-Yi Kuan and Hung-yi Lee. 2025. Gender Bias in Instruction-Guided Speech Synthesis Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 5387–5413, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Gender Bias in Instruction-Guided Speech Synthesis Models (Kuan & Lee, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.findings-naacl.298.pdf