Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation

Raj Gaurav Maurya; Vaibhav Shukla; Sreedath Panat

Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation

Raj Gaurav Maurya, Vaibhav Shukla, Sreedath Panat

Abstract

The recent rapid real-world adoption of vision-language models (VLMs) raises concerns about how social biases encoded in language may propagate into visual generation. In this work, we examine whether socioeconomic stereotypes, expressed through occupation and income-related linguistic cues in prompts, systematically influences skin-tone representations in text-to-image (T2I) generation, with a focus on colorism as a visual marker of social inequality. We first benchmark 3 small VLMs and 60 human annotators on the Monk Skin Tone (MST) scale using the MST-E dataset. We then conduct a large-scale T2I generation study in which we systematically vary the linguistic framing of income in prompts describing 210 occupations, producing over 2,500 portraits across 3 large VLMs. The skin-tone audit of the portraits by the best-performing annotator (GPT-5 mini) reveals strong color bias: high-income prompts consistently produce lighter-skinned faces, with prompt constraints only modestly attenuating this effect. Bias magnitude varies across generators, with GPT-5 Image-mini and Gemini-2.5 Flash-Image exhibiting more pronounced shifts in MST than Grok-2 Image. Our findings indicate that VLMs encode and amplify ethnoracialized socioeconomic stereotypes in language-conditioned image generation, underscoring the need for cross-modal fairness audits and human-centered evaluations.

Anthology ID:: 2026.eacl-srw.69
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 937–951
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.69/
DOI:
Bibkey:
Cite (ACL):: Raj Gaurav Maurya, Vaibhav Shukla, and Sreedath Panat. 2026. Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 937–951, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Colorism in Multimodal AI: An Empirical Exploration of Socioeconomic Linguistic Bias in Text-to-Image Generation (Maurya et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.69.pdf

PDF Cite Search Fix data