Gender Representation Bias Analysis in LLM-Generated Czech and Slovenian Texts

Erik Derner, Kristina Batistič


Abstract
Large language models (LLMs) often reflect social biases present in their training data, including imbalances in how different genders are represented. While most prior work has focused on English, gender representation bias remains underexplored in morphologically rich languages where grammatical gender is pervasive. We present a method for detecting and quantifying such bias in Czech and Slovenian, using LLMs to classify gendered person references in LLM-generated narratives. Applying this method to outputs from a range of models, we find substantial variation in gender balance. While some models produce near-equal proportions of male and female references, others exhibit strong male overrepresentation. Our findings highlight the need for fine-grained bias evaluation in under-represented languages and demonstrate the potential of LLM-based annotation in this space. We make our code and data publicly available.
Anthology ID:
2025.bsnlp-1.15
Volume:
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
Venues:
BSNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
124–135
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.15/
DOI:
Bibkey:
Cite (ACL):
Erik Derner and Kristina Batistič. 2025. Gender Representation Bias Analysis in LLM-Generated Czech and Slovenian Texts. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 124–135, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Gender Representation Bias Analysis in LLM-Generated Czech and Slovenian Texts (Derner & Batistič, BSNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.15.pdf