Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza (Editors)

Anthology ID:: 2025.gebnlp-1
Month:: August
Year:: 2025
Address:: Vienna, Austria
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.gebnlp-1/
DOI:
ISBN:: 979-8-89176-277-0
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.gebnlp-1.pdf

pdf bib
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Agnieszka Faleńska | Christine Basta | Marta Costa-jussà | Karolina Stańczak | Debora Nozza

With the development of large language models (LLMs), social biases in these LLMs have become a pressing issue.Although there are various benchmarks for social biases across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated.In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, with analysis of social biases in Japanese LLMs.The results show that while current open Japanese LLMs with more parameters show improved accuracies on JBBQ, their bias scores increase.In addition, prompts with a warning about social biases and chain-of-thought prompting reduce the effect of biases in model outputs, but there is room for improvement in extracting the correct evidence from contexts in Japanese. Our dataset is available at https://github.com/ynklab/JBBQ_data.

An growing number of studies have examined the social bias of rapidly developed large language models (LLMs). Although most of these studies have focused on bias occurring in a single social attribute, research in social science has shown that social bias often occurs in the form of intersectionality—the constitutive and contextualized perspective on bias aroused by social attributes. In this study, we construct the Japanese benchmark inter-JBBQ, designed to evaluate the intersectional bias in LLMs on the question-answering setting. Using inter-JBBQ to analyze GPT-4o and Swallow, we find that biased output varies according to its contexts even with the equal combination of social attributes.

pdf bib abs
Detecting Bias and Intersectional Bias in Italian Word Embeddings and Language Models
Alexandre Puttick | Mascha Kurpicz-Briki

Bias in Natural Language Processing (NLP) applications has become a critical issue, with many methods developed to measure and mitigate bias in word embeddings and language models. However, most approaches focus on single categories such as gender or ethnicity, neglecting the intersectionality of biases, particularly in non-English languages. This paper addresses these gaps by studying both single-category and intersectional biases in Italian word embeddings and language models. We extend existing bias metrics to Italian, introducing GG-FISE, a novel method for detecting intersectional bias while accounting for grammatical gender. We also adapt the CrowS-Pairs dataset and bias metric to Italian. Through a series of experiments using WEAT, SEAT, and LPBS tests, we identify significant biases along gender and ethnic lines, with particular attention to biases against Romanian and South Asian populations. Our results highlight the need for culturally adapted methods to detect and address biases in multilingual and intersectional contexts.

pdf bib abs
Power(ful) Associations: Rethinking “Stereotype” for NLP
Hannah Devinney

The tendency for Natural Language Processing (NLP) technologies to reproduce stereotypical associations, such as associating Black people with criminality or women with care professions, is a site of major concern and, therefore, much study. Stereotyping is a powerful tool of oppression, but the social and linguistic mechanisms behind it are largely ignored in the NLP field. Thus, we fail to effectively challenge stereotypes and the power asymmetries they reinforce. This opinion paper problematizes several common aspects of current work addressing stereotyping in NLP, and offers practicable suggestions for potential forward directions.

pdf bib abs
Introducing MARB — A Dataset for Studying the Social Dimensions of Reporting Bias in Language Models
Tom Södahl Bladsjö | Ricardo Muñoz Sánchez

Reporting bias is the tendency for speakers to omit unnecessary or obvious information while mentioning things they consider relevant or surprising. In descriptions of people, reporting bias can manifest as a tendency to over report on attributes that deviate from the norm. While social bias in language models has garnered a lot of attention in recent years, a majority of the existing work equates “bias” with “stereotypes”. We suggest reporting bias as an alternative lens through which to study how social attitudes manifest in language models. We present the MARB dataset, a diagnostic dataset for studying the interaction between social bias and reporting bias in language models. We use MARB to evaluate the off-the-shelf behavior of both masked and autoregressive language models and find signs of reporting bias with regards to marginalized identities, mirroring that which can be found in human text. This effect is particularly pronounced when taking gender into account, demonstrating the importance of considering intersectionality when studying social phenomena like biases.

pdf bib abs
Gender Bias in Nepali-English Machine Translation: A Comparison of LLMs and Existing MT Systems
Supriya Khadka | Bijayan Bhattarai

Bias in Nepali NLP is rarely addressed, as the language is classified as low-resource, which leads to the perpetuation of biases in downstream systems. Our research focuses on gender bias in Nepali-English machine translation, an area that has seen little exploration. With the emergence of Large Language Models(LLM), there is a unique opportunity to mitigate these biases. In this study, we quantify and evaluate gender bias by constructing an occupation corpus and adapting three gender-bias challenge sets for Nepali. Our findings reveal that gender bias is prevalent in existing translation systems, with translations often reinforcing stereotypes and misrepresenting gender-specific roles. However, LLMs perform significantly better in both gender-neutral and gender-specific contexts, demonstrating less bias compared to traditional machine translation systems. Despite some quirks, LLMs offer a promising alternative for culture-rich, low-resource languages like Nepali. We also explore how LLMs can improve gender accuracy and mitigate biases in occupational terms, providing a more equitable translation experience. Our work contributes to the growing effort to reduce biases in machine translation and highlights the potential of LLMs to address bias in low-resource languages, paving the way for more inclusive and accurate translation systems.

pdf bib abs
Mind the Gap: Gender-based Differences in Occupational Embeddings
Olga Kononykhina | Anna-Carolina Haensch | Frauke Kreuter

Large Language Models (LLMs) offer promising alternatives to traditional occupational coding approaches in survey research. Using a German dataset, we examine the extent to which LLM-based occupational coding differs by gender. Our findings reveal systematic disparities: gendered job titles (e.g., “Autor” vs. “Autorin”, meaning “male author” vs. “female author”) frequently result in diverging occupation codes, even when semantically identical. Across all models, 54%–82% of gendered inputs obtain different Top-5 suggestions. The practical impact, however, depends on the model. GPT includes the correct code most often (62%) but demonstrates female bias (up to +18 pp). IBM is less accurate (51%) but largely balanced. Alibaba, Gemini, and MiniLM achieve about 50% correct-code inclusion, and their small (< 10 pp) and direction-flipping gaps could indicate a sampling noise rather than gender bias. We discuss these findings in the context of fairness and reproducibility in NLP applications for social data.

Understanding the sources of variability in annotations is crucial for developing fair NLP systems, especially for tasks like sexism detection where demographic bias is a concern. This study investigates the extent to which annotator demographic features influence labeling decisions compared to text content. Using a Generalized Linear Mixed Model, we quantify this influence, finding that while statistically present, demographic factors account for a minor fraction (~8%) of the observed variance, with tweet content being the dominant factor. We then assess the reliability of Generative AI (GenAI) models as annotators, specifically evaluating if guiding them with demographic personas improves alignment with human judgments. Our results indicate that simplistic persona prompting often fails to enhance, and sometimes degrades, performance compared to baseline models. Furthermore, explainable AI (XAI) techniques reveal that model predictions rely heavily on content-specific tokens related to sexism, rather than correlates of demographic characteristics. We argue that focusing on content-driven explanations and robust annotation protocols offers a more reliable path towards fairness than potentially persona simulation.

pdf bib abs
WoNBias: A Dataset for Classifying Bias & Prejudice Against Women in Bengali Text
Md. Raisul Islam Aupi | Nishat Tafannum | Md. Shahidur Rahman | Kh Mahmudul Hassan | Naimur Rahman

This paper presents WoNBias, a curated Bengali dataset to identify gender-based biases, stereotypes, and harmful language directed at women. It merges digital sources- social media, blogs, news- with offline tactics comprising surveys and focus groups, alongside some existing corpora to compile a total of 31,484 entries (10,656 negative; 10,170 positive; 10,658 neutral). WoNBias reflects the sociocultural subtleties of bias in both Bengali digital and offline conversations. By bridging online and offline biased contexts, the dataset supports content moderation, policy interventions, and equitable NLP research for Bengali, a low-resource language critically underserved by existing tools. WoNBias aims to combat systemic gender discrimination against women on digital platforms, empowering researchers and practitioners to combat harmful narratives in Bengali-speaking communities.

pdf bib abs
Strengths and Limitations of Word-Based Task Explainability in Vision Language Models: a Case Study on Biological Sex Biases in the Medical Domain
Lorenzo Bertolini | Valentin Comte | Victoria Ruiz-Serra | Lia Orfei | Mario Ceresa

Vision-language models (VLMs) can achieve high accuracy in medical applications but can retain demographic biases from training data. While multiple works have identified the presence of these biases in many VLMs, it remains unclear how strong their impact at the inference level is. In this work, we study how well a task-level explainability method based on linear combinations of words can detect multiple types of biases, with a focus on medical image classification. By manipulating the training datasets with demographic and non-demographic biases, we show how the adopted approach can detect explicitly encoded biases but fails with implicitly encoded ones, particularly biological sex. Our results suggest that such a failure likely stems from misalignment between sex-describing features in image versus text modalities. Our findings highlight limitations in the evaluated explainability method for detecting implicit biases in medical VLMs.

pdf bib abs
Wanted: Personalised Bias Warnings for Gender Bias in Language Models
Chiara Di Bonaventura | Michelle Nwachukwu | Maria Stoica

The widespread use of language models, especially Large Language Models, paired with their inherent biases can propagate and amplify societal inequalities. While research has extensively explored methods for bias mitigation and measurement, limited attention has been paid to how such biases are communicated to users, which instead can have a positive impact on increasing user trust and understanding of these models. Our study addresses this gap by investigating user preferences for gender bias mitigation, measurement and communication in language models. To this end, we conducted a user study targeting female AI practitioners with eighteen female and one male participant. Our findings reveal that user preferences for bias mitigation and measurement show strong consensus, whereas they vary widely for bias communication, underscoring the importance of tailoring warnings to individual needs.Building on these findings, we propose a framework for user-centred bias reporting, which leverages runtime monitoring techniques to assess and visualise bias in real time and in a customizable fashion.

Within the context of Natural Language Processing (NLP), fairness evaluation is often associated with the assessment of bias and reduction of associated harm. In this regard, the evaluation is usually carried out by using a benchmark dataset, for a task such as Question Answering, created for the measurement of bias in the model’s predictions along various dimensions, including gender identity. In our work, we evaluate gender bias in German Large Language Models (LLMs) using the Bias Benchmark for Question Answering by Parrish et al. (2022) as a reference. Specifically, the templates in the gender identity subset of this English dataset were machine translated into German. The errors in the machine translated templates were then manually reviewed and corrected with the help of a language expert. We find that manual revision of the translation is crucial when creating datasets for gender bias evaluation because of the limitations of machine translation from English to a language such as German with grammatical gender. Our final dataset is comprised of two subsets: Subset-I, which consists of group terms related to gender identity, and Subset-II, where group terms are replaced with proper names. We evaluate several LLMs used for German NLP on this newly created dataset and report the accuracy and bias scores. The results show that all models exhibit bias, both along and against existing social stereotypes.

pdf bib abs
Tag-First: Mitigating Distributional Bias in Synthetic User Profiles through Controlled Attribute Generation
Ismael Garrido-Muñoz | Arturo Montejo-Ráez | Fernando Martínez Santiago

Addressing the critical need for robust bias testing in AI systems, current methods often rely on overly simplistic or rigid persona templates, limiting the depth and realism of fairness evaluations. We introduce a novel framework and an associated tool designed to generate high-quality, diverse, and configurable personas specifically for nuanced bias assessment. Our core innovation lies in a two-stage process: first, generating structured persona tags based solely on user-defined configurations (specified manually or via an included agent tool), ensuring attribute distributions are controlled and crucially, are not skewed by an LLM’s inherent biases regarding attribute correlations during the selection phase. Second, transforming these controlled tags into various realistic outputs—including natural language descriptions, CVs, or profiles—suitable for diverse bias testing scenarios. This tag-centric approach preserves ground-truth attributes for analyzing correlations and biases within the generated population and downstream AI applications. We demonstrate the system’s efficacy by generating and validating 1,000 personas, analyzing both the adherence of natural language descriptions to the source tags and the potential biases introduced by the LLM during the transformation step. The provided dataset, including both generated personas and their source tags, enables detailed analysis. This work offers a significant step towards more reliable, controllable, and representative fairness testing in AI development.

pdf bib abs
Characterizing non-binary French: A first step towards debiasing gender inference
Marie Flesch | Heather Burnett

This paper addresses a bias of gender inference systems: their binary nature. Based on the observation that, for French, systems based on pattern-matching of grammatical gender markers in “I am” expressions perform better than machine-learning approaches (Ciot et al. 2013), we examine the use of grammatical gender by non-binary individuals. We describe the construction of a corpus of texts produced by non-binary authors on Reddit, (formely) Twitter and three forums. Our linguistic analysis shows three main patterns of use: authors who use non-binary markers, authors who consistently use one grammatical gender, and authors who use both feminine and masculine markers. Using this knowledge, we make proposals for the improvements of existing gender inference systems based on grammatical gender.

pdf bib abs
Can Explicit Gender Information Improve Zero-Shot Machine Translation?
Van-Hien Tran | Huy Hien Vu | Hideki Tanaka | Masao Utiyama

Large language models (LLMs) have demonstrated strong zero-shot machine translation (MT) performance but often exhibit gender bias that is present in their training data, especially when translating into grammatically gendered languages. In this paper, we investigate whether explicitly providing gender information can mitigate this issue and improve translation quality. We propose a two-step approach: (1) inferring entity gender from context, and (2) incorporating this information into prompts using either Structured Tagging or Natural Language. Experiments with five LLMs across four language pairs show that explicit gender cues consistently reduce gender errors, with structured tagging yielding the largest gains. Our results highlight prompt-level gender disambiguation as a simple yet effective strategy for more accurate and fair zero-shot MT.

pdf bib abs
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
Elisa Forcada Rodríguez | Olatz Perez-de-Vinaspre | Jon Ander Campos | Dietrich Klakow | Vagrant Gautam

One of the goals of fairness research in NLP is to measure and mitigate stereotypical biases that are propagated by NLP systems. However, such work tends to focus on single axes of bias (most often gender) and the English language. Addressing these limitations, we contribute the first study of multilingual intersecting country and gender biases, with a focus on occupation recommendations generated by large language models. We construct a benchmark of prompts in English, Spanish and German, where we systematically vary country and gender, using 25 countries and four pronoun sets. Then, we evaluate a suite of 5 Llama-based models on this benchmark, finding that LLMs encode significant gender and country biases. Notably, we find that even when models show parity for gender or country individually, intersectional occupational biases based on both country and gender persist. We also show that the prompting language significantly affects bias, and instruction-tuned models consistently demonstrate the lowest and most stable levels of bias. Our findings highlight the need for fairness researchers to use intersectional and multilingual lenses in their work.

pdf bib abs
Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages
Lance Calvin Lim Gamboa | Yue Feng | Mark G. Lee

Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages—particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models—one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships—entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English models process inputs linked to sociodemographic groups and bias.

pdf bib abs
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models
Aleksandra Sorokovikova | Pavel Chizhov | Iuliia Eremenko | Ivan P. Yamshchikov

Modern language models are trained on large amounts of data. These data inevitably include controversial and stereotypical content, which contains all sorts of biases related to gender, origin, age, etc. As a result, the models express biased points of view or produce different results based on the assigned personality or the personality of the user. In this paper, we investigate various proxy measures of bias in large language models (LLMs). We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores. However, if we reformulate the task and ask a model to grade the user’s answer, this shows more significant signs of bias. Finally, if we ask the model for salary negotiation advice, we see pronounced bias in the answers. With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle: modern LLM users do not need to pre-prompt the description of their persona since the model already knows their socio-demographics.

pdf bib abs
Measuring Gender Bias in Language Models in Farsi
Hamidreza Saffari | Mohammadamin Shafiei | Donya Rooein | Debora Nozza

As Natural Language Processing models become increasingly embedded in everyday life, ensuring that these systems can measure and mitigate bias is critical. While substantial work has been done to identify and mitigate gender bias in English, Farsi remains largely underexplored. This paper presents the first comprehensive study of gender bias in language models in Farsi across three tasks: emotion analysis, question answering, and hurtful sentence completion. We assess a range of language models across all the tasks in zero-shot settings. By adapting established evaluation frameworks for Farsi, we uncover patterns of gender bias that differ from those observed in English, highlighting the urgent need for culturally and linguistically inclusive approaches to bias mitigation in NLP.

pdf bib abs
A Diachronic Analysis of Human and Model Predictions on Audience Gender in How-to Guides
Nicola Fanton | Sidharth Ranjan | Titus Von Der Malsburg | Michael Roth

We examine audience-specific how-to guides on wikiHow, in English, diachronically by comparing predictions from fine-tuned language models and human judgments. Using both early and revised versions, we quantitatively and qualitatively study how gender-specific features are identified over time. While language model performance remains relatively stable in terms of macro F₁-scores, we observe an increased reliance on stereotypical tokens. Notably, both models and human raters tend to overpredict women as an audience, raising questions about bias in the evaluation of educational systems and resources.

pdf bib abs
ArGAN: Arabic Gender, Ability, and Nationality Dataset for Evaluating Biases in Large Language Models
Ranwa Aly | Yara Allam | Rana Gaber | Christine Basta

Large language models (LLMs) are pretrained on substantial, unfiltered corpora, assembled from a variety of sources. This risks inheriting the deep-rooted biases that exist within them, both implicit and explicit. This is even more apparent in low-resource languages, where corpora may be prioritized by quantity over quality, potentially leading to more unchecked biases. More specifically, we address the biases present in the Arabic language in both general-purpose and Arabic-specialized architectures in three dimensions of demographics: gender, ability, and nationality. To properly assess the fairness of these models, we experiment with bias-revealing prompts and estimate the performance using existing evaluation metrics, and propose adaptations to others.

pdf bib abs
Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields
Noor Mairukh Khan Arnob | Saiyara Mahmud | Azmine Toushik Wasi

Gender bias continues to shape societal perceptions across both STEM (Science, Technology, Engineering, and Mathematics) and SHAPE (Social Sciences, Humanities, and the Arts for People and the Economy) domains. While existing studies have explored such biases in English language models, similar analyses in Bangla—spoken by over 240 million people—remain scarce. In this work, we investigate gender-profession associations in Bangla language models. We introduce Pokkhopat, a curated dataset of gendered terms and profession-related words across STEM and SHAPE disciplines. Using a suite of embedding-based bias detection methods—including WEAT, ECT, RND, RIPA, and cosine similarity visualizations—we evaluate 11 Bangla language models. Our findings show that several widely-used open-source Bangla NLP models (e.g., sagorsarker/bangla-bert-base) exhibit significant gender bias, underscoring the need for more inclusive and bias-aware development in low-resource languages like Bangla. We also find that many STEM and SHAPE-related words are absent from these models’ vocabularies, complicating bias detection and possibly amplifying existing biases. This emphasizes the importance of incorporating more diverse and comprehensive training data to mitigate such biases moving forward. Code available at https://github.com/HerWILL-Inc/ACL-2025/.

Machine learning (ML) models are increasingly used to support clinical decision-making. However, real-world medical datasets are often noisy, incomplete, and imbalanced, leading to performance disparities across patient subgroups. These differences raise fairness concerns, particularly when they reinforce existing disadvantages for marginalized groups. In this work, we analyze several medical prediction tasks and demonstrate how model performance varies with patient characteristics. While ML models may demonstrate good overall performance, we argue that subgroup-level evaluation is essential before integrating them into clinical workflows. By conducting a performance analysis at the subgroup level, differences can be clearly identified—allowing, on the one hand, for performance disparities to be considered in clinical practice, and on the other hand, for these insights to inform the responsible development of more effective models. Thereby, our work contributes to a practical discussion around the subgroup-sensitive development and deployment of medical ML models and the interconnectedness of fairness and transparency.

pdf bib abs
From Measurement to Mitigation: Exploring the Transferability of Debiasing Approaches to Gender Bias in Maltese Language Models
Melanie Galea | Claudia Borg

The advancement of Large Language Models (LLMs) has transformed Natural Language Processing (NLP), enabling performance across diverse tasks with little task-specific training. However, LLMs remain susceptible to social biases, particularly reflecting harmful stereotypes from training data, which can disproportionately affect marginalised communities.We measure gender bias in Maltese LMs, arguing that such bias is harmful as it reinforces societal stereotypes and fails to account for gender diversity, which is especially problematic in gendered, low-resource languages.While bias evaluation and mitigation efforts have progressed for English-centric models, research on low-resourced and morphologically rich languages remains limited. This research investigates the transferability of debiasing methods to Maltese language models, focusing on BERTu and mBERTu, BERT-based monolingual and multilingual models respectively. Bias measurement and mitigation techniques from English are adapted to Maltese, using benchmarks such as CrowS-Pairs and SEAT, alongside debiasing methods Counterfactual Data Augmentation, Dropout Regularization, Auto-Debias, and GuiDebias. We also contribute to future work in the study of gender bias in Maltese by creating evaluation datasets.Our findings highlight the challenges of applying existing bias mitigation methods to linguistically complex languages, underscoring the need for more inclusive approaches in the development of multilingual NLP.

pdf bib abs
GENDEROUS: Machine Translation and Cross-Linguistic Evaluation of a Gender-Ambiguous Dataset
Janiça Hackenbuchner | Joke Daems | Eleni Gkovedarou

Contributing to research on gender beyond the binary, this work introduces GENDEROUS, a dataset of gender-ambiguous sentences containing gender-marked occupations and adjectives, and sentences with the ambiguous or non-binary pronoun their. We cross-linguistically evaluate how machine translation (MT) systems and large language models (LLMs) translate these sentences from English into four grammatical gender languages: Greek, German, Spanish and Dutch. We show the systems’ continued default to male-gendered translations, with exceptions (particularly for Dutch). Prompting for alternatives, however, shows potential in attaining more diverse and neutral translations across all languages. An LLM-as-a-judge approach was implemented, where benchmarking against gold standards emphasises the continued need for human annotations.

pdf bib abs
Fine-Tuning vs Prompting Techniques for Gender-Fair Rewriting of Machine Translations
Paolo Mainardi | Federico Garcea | Alberto Barrón-Cedeño

Increasing attention is being dedicated by the NLP community to gender-fair practices, including emerging forms of non-binary language. Given the shift to the prompting paradigm for multiple tasks, direct comparisons between prompted and fine-tuned models in this context are lacking. We aim to fill this gap by comparing prompt engineering and fine-tuning techniques for gender-fair rewriting in Italian. We do so by framing a rewriting task where Italian gender-marked translations from English gender-ambiguous sentences are adapted into a gender-neutral alternative using direct non-binary language. We augment existing datasets with gender-neutral translations and conduct experiments to determine the best architecture and approach to complete such task, by fine-tuning and prompting seq2seq encoder-decoder and autoregressive decoder-only models. We show that smaller seq2seq models can reach good performance when fine-tuned, even with relatively little data; when it comes to prompts, including task demonstrations is crucial, and we find that chat-tuned models reach the best results in a few-shot setting. We achieve promising results, especially in contexts of limited data and resources.

pdf bib abs
Some Myths About Bias: A Queer Studies Reading Of Gender Bias In NLP
Filipa Calado

This paper critiques common assumptions about gender bias in NLP, focusing primarily on word vector-based methods for detecting and mitigating bias. It argues that these methods assume a kind of “binary thinking” that goes beyond the gender binary toward a conceptual model that structures and limits the effectiveness of these techniques. Drawing its critique from the Humanities field of Queer Studies, this paper demonstrates that binary thinking drives two “myths” in gender bias research: first, that bias is categorical, measuring bias in terms of presence/absence, and second, that it is zero-sum, where the relations between genders are idealized as symmetrical. Due to their use of binary thinking, each of these myths flattens bias into a measure that cannot distinguish between the types of bias and their effects in language. The paper concludes by briefly pointing to methods that resist binary thinking, such as those that diversify and amplify gender expressions.

pdf bib abs
GenWriter: Reducing Gender Cues in Biographies through Text Rewriting
Shweta Soundararajan | Sarah Jane Delany

Gendered language is the use of words that indicate an individual’s gender. Though useful in certain context, it can reinforce gender stereotypes and introduce bias, particularly in machine learning models used for tasks like occupation classification. When textual content such as biographies contains gender cues, it can influence model predictions, leading to unfair outcomes such as reduced hiring opportunities for women. To address this issue, we propose GenWriter, an approach that integrates Case-Based Reasoning (CBR) with Large Language Models (LLMs) to rewrite biographies in a way that obfuscates gender while preserving semantic content. We evaluate GenWriter by measuring gender bias in occupation classification before and after rewriting the biographies used for training the occupation classification model. Our results show that GenWriter significantly reduces gender bias by 89% in nurse biographies and 62% in surgeon biographies, while maintaining classification accuracy. In comparison, an LLM-only rewriting approach achieves smaller bias reductions (by 44% and 12% in nurse and surgeon biographies, respectively) and leads to some classification performance degradation.

pdf bib abs
Examining the Cultural Encoding of Gender Bias in LLMs for Low-Resourced African Languages
Abigail Oppong | Hellina Hailu Nigatu | Chinasa T. Okolo

Large Language Models (LLMs) are deployed in several aspects of everyday life. While the technology could have several benefits, like many socio-technical systems, it also encodes several biases. Trained on large, crawled datasets from the web, these models perpetuate stereotypes and regurgitate representational bias that is rampant in their training data. Languages encode gender in varying ways; some languages are grammatically gendered, while others do not. Bias in the languages themselves may also vary based on cultural, social, and religious contexts. In this paper, we investigate gender bias in LLMs by selecting two languages, Twi and Amharic. Twi is a non-gendered African language spoken in Ghana, while Amharic is a gendered language spoken in Ethiopia. Using these two languages on the two ends of the continent and their opposing grammatical gender system, we evaluate LLMs in three tasks: Machine Translation, Image Generation, and Sentence Completion. Our results give insights into the gender bias encoded in LLMs using two low-resourced languages and broaden the conversation on how culture and social structures play a role in disparate system performances.

pdf bib abs
Ableism, Ageism, Gender, and Nationality bias in Norwegian and Multilingual Language Models
Martin Sjåvik | Samia Touileb

We investigate biases related to ageism, ableism, nationality, and gender in four Norwegian and two multilingual language models. Our methodology involves using a set of templates. constructed around stimuli and attributes relevant to these categories. We use statistical and predictive evaluation methods, including Kendall’s Tau correlation and dependent variable prediction rates, to assess model behaviour and output bias. Our findings indicate that models frequently associate older individuals, people with disabilities, and poorer countries with negative attributes, potentially reinforcing harmful stereotypes. However, most tested models appear to handle gender-related biases more effectively. Our findings indicate a correlation between the sentiment of the input and that of the output.

pdf bib abs
Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models
Yangge Qian | Yilong Hu | Siqi Zhang | Xu Gu | Xiaolin Qin

Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.

In the current landscape of automatic language generation, there is a need to understand, evaluate, and mitigate demographic biases, as existing models are becoming increasingly multilingual. To address this, we present the initial eight languages from the Massive Multilingual Holistic Bias (MMHB) dataset and benchmark consisting of approximately 6 million sentences. The sentences are designed to induce biases towards different groups of people which can yield significant results when using them as a benchmark to test different text generation models. To further scale up in terms of both language coverage and size and to leverage limited human translation, we use systematic approach to independently translate sentence parts. This technique carefully designs a structure to dynamically generate multiple sentence variations and significantly reduces the human translation workload. The translation process has been meticulously conducted to avoid an English-centric perspective and include all necessary morphological variations for languages that require them, improving from the original English HOLISTICBIAS. Finally, we utilize MMHB to report results on gender bias and added toxicity in MT tasks.

pdf bib abs
Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language
Kristin Gnadt | David Thulke | Simone Kopeinik | Ralf Schlüter

In recent years, various methods have been proposed to evaluate gender bias in large language models (LLMs). A key challenge lies in the transferability of bias measurement methods initially developed for the English language when applied to other languages. This work aims to contribute to this research strand by presenting five German datasets for gender bias evaluation in LLMs. The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies. Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German, including the ambiguous interpretation of male occupational terms and the influence of seemingly neutral nouns on gender perception. This work contributes to the understanding of gender bias in LLMs across languages and underscores the necessity for tailored evaluation frameworks.

pdf bib abs
Adapting Psycholinguistic Research for LLMs: Gender-inclusive Language in a Coreference Context
Marion Bartl | Thomas Brendan Murphy | Susan Leavy

Gender-inclusive language is often used with the aim of ensuring that all individuals, regardless of gender, can be associated with certain concepts. While psycholinguistic studies have examined its effects in relation to human cognition, it remains unclear how Large Language Models (LLMs) process gender-inclusive language. Given that commercial LLMs are gaining an increasingly strong foothold in everyday applications, it is crucial to examine whether LLMs in fact interpret gender-inclusive language neutrally, because the language they generate has the potential to influence the language of their users. This study examines whether LLM-generated coreferent terms align with a given gender expression or reflect model biases. Adapting psycholinguistic methods from French to English and German, we find that in English, LLMs generally maintain the antecedent’s gender but exhibit underlying masculine bias. In German, this bias is much stronger, overriding all tested gender-neutralization strategies.

pdf bib abs
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
Erik Derner | Sara Sansalvador De La Fuente | Yoan Gutierrez | Paloma Moreda Pozo | Nuria M Oliver

Large language models (LLMs) often inherit and amplify social biases embedded in their training data. A prominent social bias is gender bias. In this regard, prior work has mainly focused on gender stereotyping bias – the association of specific roles or traits with a particular gender – in English and on evaluating gender bias in model embeddings or generated outputs. In contrast, gender representation bias – the unequal frequency of references to individuals of different genders – in the training corpora has received less attention. Yet such imbalances in the training data constitute an upstream source of bias that can propagate and intensify throughout the entire model lifecycle. To fill this gap, we propose a novel LLM-based method to detect and quantify gender representation bias in LLM training data in gendered languages, where grammatical gender challenges the applicability of methods developed for English. By leveraging the LLMs’ contextual understanding, our approach automatically identifies and classifies person-referencing words in gendered language corpora. Applied to four Spanish-English benchmarks and five Valencian corpora, our method reveals substantial male-dominant imbalances. We show that such biases in training data affect model outputs, but can surprisingly be mitigated leveraging small-scale training on datasets that are biased towards the opposite gender. Our findings highlight the need for corpus-level gender bias analysis in multilingual NLP. We make our code and data publicly available.