VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models

Chahat Raj; Bowen Wei; Aylin Caliskan; Antonios Anastasopoulos; Ziwei Zhu

VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models

Chahat Raj, Bowen Wei, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

Abstract

While bias in large language models (LLMs) is well-studied, similar concerns in vision-language models (VLMs) have received comparatively less attention. Existing VLM bias studies often focus on portrait-style images and gender-occupation associations, overlooking broader and more complex social stereotypes and their implied harm. This work introduces Vignette, a large-scale VQA benchmark with 30M+ images for evaluating bias in VLMs through a question-answering framework spanning four directions: factuality, perception, stereotyping, and decision making. Beyond narrowly-centered studies, we assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination. Drawing from social psychology, we examine how VLMs connect visual identity cues to trait and role-based inferences, encoding social hierarchies, through biased selections. Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.

Anthology ID:: 2026.acl-long.712
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15645–15673
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.712/
DOI:
Bibkey:
Cite (ACL):: Chahat Raj, Bowen Wei, Aylin Caliskan, Antonios Anastasopoulos, and Ziwei Zhu. 2026. VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15645–15673, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models (Raj et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.712.pdf
Checklist:: 2026.acl-long.712.checklist.pdf

PDF Cite Search Checklist Fix data