Uncovering Cultural Representation Disparities in Vision-Language Models
Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Srishti Yadav, Suman Debnath, Alejandro R. Salamanca, Desmond Elliott
Abstract
Vision-Language Models (VLMs) have demonstrated impressive capabilities across a range of tasks, yet concerns about their potential biases persist. This work investigates the cultural biases in state-of-the-art VLMs by evaluating their performance on an image-based country identification task at the country level. Utilizing the geographically diverse Country211 (CITATION) dataset, we probe VLMs via open-ended questions, multiple-choice questions (MCQs), and include challenging multilingual and adversarial task settings. Our analysis aims to uncover disparities in model accuracy across different countries and question formats, providing insights into how training data distribution and evaluation methodologies may influence cultural biases in VLMs. The findings highlight significant variations in performance, suggesting that while VLMs possess considerable visual understanding, they inherit biases from their pre-training data and scale, which impact their ability to generalize uniformly across diverse global contexts.- Anthology ID:
- 2025.findings-ijcnlp.131
- Volume:
- Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
- Venue:
- Findings
- SIG:
- Publisher:
- The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
- Note:
- Pages:
- 2087–2117
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.131/
- DOI:
- Cite (ACL):
- Ram Mohan Rao Kadiyala, Siddhant Gupta, Jebish Purbey, Srishti Yadav, Suman Debnath, Alejandro R. Salamanca, and Desmond Elliott. 2025. Uncovering Cultural Representation Disparities in Vision-Language Models. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 2087–2117, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
- Cite (Informal):
- Uncovering Cultural Representation Disparities in Vision-Language Models (Kadiyala et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.131.pdf