Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models

Yangge Qian, Yilong Hu, Siqi Zhang, Xu Gu, Xiaolin Qin


Abstract
Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.
Anthology ID:
2025.gebnlp-1.33
Volume:
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
393–402
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.gebnlp-1.33/
DOI:
Bibkey:
Cite (ACL):
Yangge Qian, Yilong Hu, Siqi Zhang, Xu Gu, and Xiaolin Qin. 2025. Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 393–402, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models (Qian et al., GeBNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.gebnlp-1.33.pdf