Shweta Soundararajan


2025

pdf bib
GenWriter: Reducing Gender Cues in Biographies through Text Rewriting
Shweta Soundararajan | Sarah Jane Delany
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Gendered language is the use of words that indicate an individual’s gender. Though useful in certain context, it can reinforce gender stereotypes and introduce bias, particularly in machine learning models used for tasks like occupation classification. When textual content such as biographies contains gender cues, it can influence model predictions, leading to unfair outcomes such as reduced hiring opportunities for women. To address this issue, we propose GenWriter, an approach that integrates Case-Based Reasoning (CBR) with Large Language Models (LLMs) to rewrite biographies in a way that obfuscates gender while preserving semantic content. We evaluate GenWriter by measuring gender bias in occupation classification before and after rewriting the biographies used for training the occupation classification model. Our results show that GenWriter significantly reduces gender bias by 89% in nurse biographies and 62% in surgeon biographies, while maintaining classification accuracy. In comparison, an LLM-only rewriting approach achieves smaller bias reductions (by 44% and 12% in nurse and surgeon biographies, respectively) and leads to some classification performance degradation.

2024

pdf bib
Investigating Gender Bias in Large Language Models Through Text Generation
Shweta Soundararajan | Sarah Jane Delany
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)