2025
pdf
bib
abs
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
Kshitish Ghate
|
Isaac Slaughter
|
Kyra Wilson
|
Mona T. Diab
|
Aylin Caliskan
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
While recent work has found that vision-language models trained under the Contrastive Language Image Pre-training (CLIP) framework contain intrinsic social biases, the extent to which different upstream pre-training features of the framework relate to these biases, and hence how intrinsic bias and downstream performance are connected has been unclear. In this work, we present the largest comprehensive analysis to-date of how the upstream pre-training factors and downstream performance of CLIP models relate to their intrinsic biases. Studying 131 unique CLIP models, trained on 26 datasets, using 55 architectures, and in a variety of sizes, we evaluate bias in each model using 26 well-established unimodal and cross-modal principled Embedding Association Tests. We find that the choice of pre-training dataset is the most significant upstream predictor of bias, whereas architectural variations have minimal impact. Additionally, datasets curated using sophisticated filtering techniques aimed at enhancing downstream model performance tend to be associated with higher levels of intrinsic bias. Finally, we observe that intrinsic bias is often significantly correlated with downstream performance (0.3 ≤ r ≤ 0.8), suggesting that models optimized for performance inadvertently learn to amplify representational biases. Comparisons between unimodal and cross-modal association tests reveal that social group bias depends heavily on the modality. Our findings imply that more sophisticated strategies are needed to address intrinsic model bias for vision-language models across the entire model development pipeline.
2024
pdf
bib
abs
Evaluating Gender Bias in Multilingual Multimodal AI Models: Insights from an Indian Context
Kshitish Ghate
|
Arjun Choudhry
|
Vanya Bannihatti Kumar
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
We evaluate gender biases in multilingual multimodal image and text models in two settings: text-to-image retrieval and text-to-image generation, to show that even seemingly gender-neutral traits generate biased results. We evaluate our framework in the context of people from India, working with two languages: English and Hindi. We work with frameworks built around mCLIP-based models to ensure a thorough evaluation of recent state-of-the-art models in the multilingual setting due to their potential for widespread applications. We analyze the results across 50 traits for retrieval and 8 traits for generation, showing that current multilingual multimodal models are biased towards men for most traits, and this problem is further exacerbated for lower-resource languages like Hindi. We further discuss potential reasons behind this observation, particularly stemming from the bias introduced by the pretraining datasets.
pdf
bib
abs
Calc-CMU at SemEval-2024 Task 7: Pre-Calc - Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath
|
Vishwa Shah
|
Kshitish Ghate
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Quantitative and numerical comprehension in language is an important task in many fields like education and finance, but still remains a challenging task for language models. While tool and calculator usage has shown to be helpful to improve mathematical reasoning in large pretrained decoder-only language models, this remains unexplored for smaller language models with encoders. In this paper, we propose Pre-Calc, a simple pre-finetuning objective of learning to use the calculator for both encoder-only and encoder-decoder architectures, formulated as a discriminative and generative task respectively. We pre-train BERT and RoBERTa for discriminative calculator use and Flan-T5 for generative calculator use on the MAWPS, SVAMP, and AsDiv-A datasets, which improves performance on downstream tasks that require numerical understanding. Our code and data are available at https://github.com/calc-cmu/pre-calc.