Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem; Mark Dras; Usman Naseem

Fairness Evaluation and Inference Level Mitigation in LLMs

Abstract

Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversational contexts. Pruning-based methods provide a flexible and transparent way to reduce bias by adjusting the neurons responsible for certain behaviors. However, most existing approaches are static; once a neuron is removed, the model loses the ability to adapt when the conversation or context changes. To address this, we propose a dynamic, reversible, pruning-based framework that detects context-aware neuron activations and applies adaptive masking to modulate their influence during generation. Our inference-time solution provides fine-grained, memory-aware mitigation with knowledge-preserved, more coherent behavior across multilingual single- and multi-turn dialogues, enabling dynamic fairness control in real-world conversational AI.

Anthology ID:: 2026.findings-acl.1452
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29048–29065
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1452/
DOI:
Bibkey:
Cite (ACL):: Afrozah Nadeem, Mark Dras, and Usman Naseem. 2026. Fairness Evaluation and Inference Level Mitigation in LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29048–29065, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Fairness Evaluation and Inference Level Mitigation in LLMs (Nadeem et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1452.pdf
Checklist:: 2026.findings-acl.1452.checklist.pdf

PDF Cite Search Checklist Fix data