KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models

Seorin Kim, Dongyoung Lee, Jaejin Lee


Abstract
Large language models (LLMs) often exhibit societal biases in their outputs, prompting ethical concerns regarding fairness and harm. In this work, we propose KLAAD (KL-Attention Alignment Debiasing), an attention-based debiasing framework that implicitly aligns attention distributions between stereotypical and anti-stereotypical sentence pairs without directly modifying model weights. KLAAD introduces a composite training objective combining Cross-Entropy, KL divergence, and Triplet losses, guiding the model to consistently attend across biased and unbiased contexts while preserving fluency and coherence. Experimental evaluation of KLAAD demonstrates improved bias mitigation on both the BBQ and BOLD benchmarks, with minimal impact on language modeling quality. The results indicate that attention-level alignment offers a principled solution for mitigating bias in generative language models.
Anthology ID:
2025.emnlp-main.774
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15324–15345
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.774/
DOI:
Bibkey:
Cite (ACL):
Seorin Kim, Dongyoung Lee, and Jaejin Lee. 2025. KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 15324–15345, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models (Kim et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.774.pdf
Checklist:
 2025.emnlp-main.774.checklist.pdf