DeFrame: Debiasing Large Language Models Against Framing Effects

Kahee Lim, Soyeon Kim, Steven Euijong Whang


Abstract
As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing challenge is hidden bias: LLMs appear fair under standard evaluations, but can produce biased responses outside those evaluation settings. In this paper, we identify framing – differences in how semantically equivalent prompts are expressed (e.g., “A is better than B” vs. “B is worse than A”) – as an underexplored contributor to this gap. We first introduce the concept of “framing disparity” to quantify the impact of framing on fairness evaluation. By augmenting fairness evaluation benchmarks with alternative framings, we find that (1) fairness scores vary significantly with framing and (2) existing debiasing methods improve overall (i.e., frame-averaged) fairness, but often fail to reduce framing-induced disparities. To address this, we propose a framing-aware debiasing method that encourages LLMs to be more consistent across framings. Experiments demonstrate that our approach reduces overall bias and improves robustness against framing disparities, enabling LLMs to produce fairer and more consistent responses.
Anthology ID:
2026.findings-acl.1777
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35672–35707
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1777/
DOI:
Bibkey:
Cite (ACL):
Kahee Lim, Soyeon Kim, and Steven Euijong Whang. 2026. DeFrame: Debiasing Large Language Models Against Framing Effects. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35672–35707, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DeFrame: Debiasing Large Language Models Against Framing Effects (Lim et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1777.pdf
Checklist:
 2026.findings-acl.1777.checklist.pdf