Chahat Raj
2026
Talent or Luck? Evaluating Attribution Bias in Large Language Models
Chahat Raj | Mahika Banerjee | Jinhao Pan | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Findings of the Association for Computational Linguistics: ACL 2026
Chahat Raj | Mahika Banerjee | Jinhao Pan | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Findings of the Association for Computational Linguistics: ACL 2026
When a student fails an exam, do we tend to blame their effort or the test’s difficulty? Attribution, defined as how reasons are assigned to event outcomes, shapes perceptions, reinforces stereotypes, and influences decisions. Attribution Theory explains how people attribute causes to internal factors (effort, ability) or external ones (task difficulty, luck). LLMs’ attribution of event outcomes based on demographics carries important fairness implications. Most works exploring social biases in LLMs focus on surface-level associations or isolated stereotypes. This work proposes a cognitively grounded bias evaluation framework to identify how models’ reasoning disparities shape demographic bias across three contexts: single-actor, actor–actor, and actor–observer, capturing comparative and perspective-driven biases overlooked in prior work. Introducing a 140k-prompt benchmark covering ten scenarios and four social dimensions, our analyses reveal attribution asymmetries across identities that vary in multi-actor and observer settings, suggesting that other identities influence bias. This work underscores the need for cognitively grounded bias evaluation and informs future debiasing efforts through the proposed framework.
VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models
Chahat Raj | Bowen Wei | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chahat Raj | Bowen Wei | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While bias in large language models (LLMs) is well-studied, similar concerns in vision-language models (VLMs) have received comparatively less attention. Existing VLM bias studies often focus on portrait-style images and gender-occupation associations, overlooking broader and more complex social stereotypes and their implied harm. This work introduces Vignette, a large-scale VQA benchmark with 30M+ images for evaluating bias in VLMs through a question-answering framework spanning four directions: factuality, perception, stereotyping, and decision making. Beyond narrowly-centered studies, we assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination. Drawing from social psychology, we examine how VLMs connect visual identity cues to trait and role-based inferences, encoding social hierarchies, through biased selections. Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.
2025
Toward Inclusive Language Models: Sparsity-Driven Calibration for Systematic and Interpretable Mitigation of Social Biases in LLMs
Prommy Sultana Hossain | Chahat Raj | Ziwei Zhu | Jessica Lin | Emanuela Marasco
Findings of the Association for Computational Linguistics: EMNLP 2025
Prommy Sultana Hossain | Chahat Raj | Ziwei Zhu | Jessica Lin | Emanuela Marasco
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) such as GPT and LLaMA excel in natural language tasks, e.g., text generation and machine translation. However, inherent biases from training on vast Internet datasets potentially amplify harmful stereotypes—widely held, oversimplified, and often inaccurate generalizations about groups of people. Our contribution introduces a novel, systematic, and architecture-aware method to identify and mitigate stereotypical bias in decoder-only transformer models. This interpretable approach operates without gradient access or retraining from scratch. We first evaluate bias and then apply a bias localization mechanism that correlates internal activations with a newly defined Context Influence (CI) Score. Our method pinpoints specific attention heads that consistently align with biased shifts in model predictions. To mitigate this, we introduce a soft pruning strategy that scales attention head parameters based on their correlation strength, followed by lightweight fine-tuning to maintain fluent text generation. Experiments across five models demonstrate our approach reduces bias by up to 37% on BBQ, 32% on StereoSet, and 33% on CrowS-Pairs while simultaneously improving reasoning performance on MMLU by up to 10%.
What’s Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs
Jinhao Pan | Chahat Raj | Ziyu Yao | Ziwei Zhu
Findings of the Association for Computational Linguistics: EMNLP 2025
Jinhao Pan | Chahat Raj | Ziyu Yao | Ziwei Zhu
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) often exhibit social biases inherited from their training data. While existing benchmarks evaluate bias by term-based mode through direct term associations between demographic terms and bias terms, LLMs have become increasingly adept at avoiding biased responses, leading to seemingly low levels of bias. However, biases persist in subtler, contextually hidden forms that traditional benchmarks fail to capture. We introduce the Description-based Bias Benchmark (DBB), a novel dataset designed to assess bias at the semantic level that bias concepts are hidden within naturalistic, subtly framed contexts in real-world scenarios rather than superficial terms. We analyze six state-of-the-art LLMs, revealing that while models reduce bias in response at the term level, they continue to reinforce biases in nuanced settings. Data, code, and results are available at https://github.com/JP-25/Description-based-Bias-Benchmark.
2024
BiasDora: Exploring Hidden Biased Associations in Vision-Language Models
Chahat Raj | Anjishnu Mukherjee | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Chahat Raj | Anjishnu Mukherjee | Aylin Caliskan | Antonios Anastasopoulos | Ziwei Zhu
Findings of the Association for Computational Linguistics: EMNLP 2024
Existing works examining Vision-Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender-profession or race-crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the **D**ataset **o**f **r**etrieved **a**ssociations (**Dora**) publicly available.
2023
Global Voices, Local Biases: Socio-Cultural Prejudices across Languages
Anjishnu Mukherjee | Chahat Raj | Ziwei Zhu | Antonios Anastasopoulos
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Anjishnu Mukherjee | Chahat Raj | Ziwei Zhu | Antonios Anastasopoulos
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of recent literature suggest, language models (LMs) trained on human data can reflect and often amplify the effects of these social biases. However, the vast majority of existing studies on bias are heavily skewed towards Western and European languages. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies and yielding interesting findings about LM bias. We additionally enhance this data with culturally relevant information for each language, capturing local contexts on a global scale. Further, to encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more. Moreover, we delve deeper into the Indian linguistic landscape, conducting a comprehensive regional bias analysis across six prevalent Indian languages. Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models.