Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?

Zhengyang Shan, Aaron Mueller


Abstract
We investigate how independent demographic bias mechanisms are from general demographic recognition in language models. Using a multi-task evaluation setup where demographics are associated with names, professions, and education levels, we measure whether models can be debiased while preserving demographic detection capabilities. We compare attribution-based and correlation-based methods for locating bias features. We find that targeted sparse autoencoder feature ablations in Gemma-2-9B reduce bias without degrading recognition performance: attribution-based ablations mitigate race and gender profession stereotypes while preserving name recognition accuracy, whereas correlation-based ablations are more effective for education bias. Qualitative analysis further reveals that removing attribution features in education tasks induces “prior collapse”, thus increasing overall bias. This highlights the need for dimension-specific interventions. Overall, our results show that demographic bias arises from task-specific mechanisms rather than absolute demographic markers, and that mechanistic inference-time interventions can enable surgical debiasing without compromising core model capabilities.
Anthology ID:
2026.eacl-long.199
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4241–4265
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.199/
DOI:
Bibkey:
Cite (ACL):
Zhengyang Shan and Aaron Mueller. 2026. Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4241–4265, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics? (Shan & Mueller, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.199.pdf