Yilong Hu


2025

pdf bib
Disentangling Biased Representations: A Causal Intervention Framework for Fairer NLP Models
Yangge Qian | Yilong Hu | Siqi Zhang | Xu Gu | Xiaolin Qin
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Natural language processing (NLP) systems often inadvertently encode and amplify social biases through entangled representations of demographic attributes and task-related attributes. To mitigate this, we propose a novel framework that combines causal analysis with practical intervention strategies. The method leverages attribute-specific prompting to isolate sensitive attributes while applying information-theoretic constraints to minimize spurious correlations. Experiments across six language models and two classification tasks demonstrate its effectiveness. We hope this work will provide the NLP community with a causal disentanglement perspective for achieving fairness in NLP systems.