Zahraa Al Sahili


2025

pdf bib
FairCoT: Enhancing Fairness in Text-to-Image Generation via Chain of Thought Reasoning with Multimodal Large Language Models
Zahraa Al Sahili | Ioannis Patras | Matthew Purver
Findings of the Association for Computational Linguistics: EMNLP 2025

In the domain of text-to-image generative models, biases inherent in training datasets often propagate into generated content, posing significant ethical challenges, particularly in socially sensitive contexts. We introduce FairCoT, a novel framework that enhances fairness in text-to-image models through Chain-of-Thought (CoT) reasoning within multimodal generative large language models. FairCoT employs iterative CoT refinement to systematically mitigate biases, and dynamically adjusts textual prompts in real time, ensuring diverse and equitable representation in generated images. By integrating iterative reasoning processes, FairCoT addresses the limitations of zero-shot CoT in sensitive scenarios, balancing creativity with ethical responsibility. Experimental evaluations across popular text-to-image systems—including DALL-E and various Stable Diffusion variants—demonstrate that FairCoT significantly enhances fairness and diversity without sacrificing image quality or semantic fidelity. By combining robust reasoning, lightweight deployment, and extensibility to multiple models, FairCoT represents a promising step toward more socially responsible and transparent AI-driven content generation.

pdf bib
Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models
Zahraa Al Sahili | Ioannis Patras | Matthew Purver
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Multilingual vision–language models (VLMs) promise universal image–text retrieval, yet their social biases remain under‐explored.We perform the first systematic audit of four public multilingual CLIP variants—M‐CLIP, NLLB‐CLIP, CAPIVARA‐CLIP, and the debiased SigLIP‐2—covering ten languages that differ in resource availability and morphological gender marking.Using balanced subsets of FairFace and the PATA stereotype suite in a zero‐shot setting, we quantify race and gender bias and measure stereotype amplification.Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English‐only baseline.CAPIVARA‐CLIP shows its largest biases precisely in the low‐resource languages it targets, while the shared encoder of NLLB‐CLIP and SigLIP‐2 transfers English gender stereotypes into gender‐neutral languages; loosely coupled encoders largely avoid this leakage.Although SigLIP‐2 reduces agency and communion skews, it inherits—and in caption‐sparse contexts (e.g., Xhosa) amplifies—the English anchor’s crime associations.Highly gendered languages consistently magnify all bias types, yet gender‐neutral languages remain vulnerable whenever cross‐lingual weight sharing imports foreign stereotypes.Aggregated metrics thus mask language‐specific “hot spots,” underscoring the need for fine‐grained, language‐aware bias evaluation in future multilingual VLM research.