Equilibrium Dynamics and Mitigation of Gender Bias in Synthetically Generated Data

Ashish Kattamuri, Arpita Vats, Harshwardhan Fartale, Rahul Raja, Akshata Kishore Moharir, Ishita Prasad


Abstract
Recursive prompting with large language models enables scalable synthetic dataset generation but introduces the risk of bias amplification. We investigate gender bias dynamics across three generations of recursive text generation using three complementary evaluation frameworks: rule-based pattern matching, embedding based semantic similarity, and downstream task performance. Experiments with three initial bias levels (0.1, 0.3, 0.6) and four mitigation strategies reveal equilibrium dynamics rather than monotonic amplification. The low initial bias amplifies toward the model’s inherent bias level (+ 36%), whereas the high initial bias decays toward it (-26%). Among mitigation methods, contrastive augmentation, which introduces gender-swapped variants, achieves significant downstream bias reduction (98.8% for low initial bias and 91% on average) despite producing higher embedding-based bias scores. This paradox demonstrates that semantic similarity metrics may diverge from behavioral fairness outcomes, highlighting the need for multidimensional evaluation in responsible synthetic data generation.
Anthology ID:
2026.ltedi-1.4
Volume:
Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion
Month:
July
Year:
2026
Address:
Virtual (Online)
Editors:
Bharathi Raja Chakravarthi, Bharathi B, Paul Buitelaar, Durairaj Thenmozhi, Miguel Ángel García Cumbreras, Salud María Jiménez Zafra
Venues:
LTEDI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–42
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.ltedi-1.4/
DOI:
Bibkey:
Cite (ACL):
Ashish Kattamuri, Arpita Vats, Harshwardhan Fartale, Rahul Raja, Akshata Kishore Moharir, and Ishita Prasad. 2026. Equilibrium Dynamics and Mitigation of Gender Bias in Synthetically Generated Data. In Proceedings of the Sixth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 37–42, Virtual (Online). Association for Computational Linguistics.
Cite (Informal):
Equilibrium Dynamics and Mitigation of Gender Bias in Synthetically Generated Data (Kattamuri et al., LTEDI 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.ltedi-1.4.pdf