Mikaela Irene Fudolig
2026
Gender Disparities in LLM-Based Intimate Partner Violence Detection
Tabia Tanzin Prama | Mikaela Irene Fudolig | Abigail M. Crocker | Christopher M. Danforth | Peter Dodds
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Tabia Tanzin Prama | Mikaela Irene Fudolig | Abigail M. Crocker | Christopher M. Danforth | Peter Dodds
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Intimate Partner Violence (IPV) is a major public health concern, and large language models (LLMs) are increasingly used for support and information-seeking in sensitive domains. We examine whether LLMs perceive relationship abuse differently depending on victim–perpetrator gender configuration. Using 475 Reddit posts from r/relationship_advice, we generate counterfactual variants by swapping gendered identifiers to create four dyads: female–female (F/F), female–male (F/M), male–female (M/F), and male–male (M/M), where the first position denotes the victim. Four recent LLMs (GPT-5o, Gemini 3, Llama 4, and Grok 3) evaluate each variant using a structured questionnaire covering IPV, perpetrator intent, cheating, and abuse subtypes. Results show substantial variation across models and dyads. Abuse and intent detection systematically decrease in mixed-gender dyads where the victim is male, with female perpetrator identity emerging as a consistent negative predictor of abuse recognition. Mixed-effects logistic regression confirms that gender roles significantly shape model outputs. Our findings suggest that LLMs reproduce gendered biases from online training data, with implications for support-related deployment. Code and resources are available at GitHub.