Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

Zahraa Al Sahili, Ioannis Patras, Matthew Purver


Abstract
Multilingual vision–language models (VLMs) promise universal image–text retrieval, yet their social biases remain under‐explored.We perform the first systematic audit of four public multilingual CLIP variants—M‐CLIP, NLLB‐CLIP, CAPIVARA‐CLIP, and the debiased SigLIP‐2—covering ten languages that differ in resource availability and morphological gender marking.Using balanced subsets of FairFace and the PATA stereotype suite in a zero‐shot setting, we quantify race and gender bias and measure stereotype amplification.Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English‐only baseline.CAPIVARA‐CLIP shows its largest biases precisely in the low‐resource languages it targets, while the shared encoder of NLLB‐CLIP and SigLIP‐2 transfers English gender stereotypes into gender‐neutral languages; loosely coupled encoders largely avoid this leakage.Although SigLIP‐2 reduces agency and communion skews, it inherits—and in caption‐sparse contexts (e.g., Xhosa) amplifies—the English anchor’s crime associations.Highly gendered languages consistently magnify all bias types, yet gender‐neutral languages remain vulnerable whenever cross‐lingual weight sharing imports foreign stereotypes.Aggregated metrics thus mask language‐specific “hot spots,” underscoring the need for fine‐grained, language‐aware bias evaluation in future multilingual VLM research.
Anthology ID:
2025.ijcnlp-long.20
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:
IJCNLP | AACL
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
331–352
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.20/
DOI:
Bibkey:
Cite (ACL):
Zahraa Al Sahili, Ioannis Patras, and Matthew Purver. 2025. Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 331–352, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models (Al Sahili et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.20.pdf