LLM Bias Detection and Mitigation through the Lens of Desired Distributions

Ingroj Shrestha; Padmini Srinivasan

doi:10.18653/v1/2025.emnlp-main.76

LLM Bias Detection and Mitigation through the Lens of Desired Distributions

Abstract

Although prior work on bias mitigation has focused on promoting social equality and demographic parity, less attention has been given to aligning LLM’s outputs to desired distributions. For example, we might want to align a model with real-world distributions to support factual grounding. Thus, we define bias as deviation from a desired distribution, which may be an equal or real-world distribution, depending on application goals. We propose a weighted adaptive loss based fine-tuning method that aligns LLM’s gender–profession output distribution with the desired distribution, while preserving language modeling capability. Using 3 profession sets—male-dominated, female-dominated, and gender-balanced—derived from U.S. labor statistics (2024), we assess both our adaptive method for reflecting reality and a non-adaptive variant for equality. Across three masked language models, bias is observed under both distributions. We achieve near-complete mitigation under equality and 30–75% reduction under real-world settings. Autoregressive LLMs show no bias under equality but notable bias under real-world settings, with the Llama Instruct models (3.2-3B, 3.1-8B) achieving a 50–62% reduction.

Anthology ID:: 2025.emnlp-main.76
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1464–1480
Language:
URL:: https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.76/
DOI:: 10.18653/v1/2025.emnlp-main.76
Bibkey:
Cite (ACL):: Ingroj Shrestha and Padmini Srinivasan. 2025. LLM Bias Detection and Mitigation through the Lens of Desired Distributions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 1464–1480, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: LLM Bias Detection and Mitigation through the Lens of Desired Distributions (Shrestha & Srinivasan, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.76.pdf
Checklist:: 2025.emnlp-main.76.checklist.pdf

PDF Cite Search Checklist Fix data