ColorFoil: Investigating Color Blindness in Large Vision and Language Models

Ahnaf Mozib Samin; M Firoz Ahmed; Md. Mushtaq Shahriyar Rafee

ColorFoil: Investigating Color Blindness in Large Vision and Language Models

Ahnaf Mozib Samin, M Firoz Ahmed, Md. Mushtaq Shahriyar Rafee

Abstract

With the utilization of Transformer architecture, large Vision and Language (V&L) models have shown promising performance in even zero-shot settings. Several studies, however, indicate a lack of robustness of the models when dealing with complex linguistics and visual attributes. In this work, we introduce a novel V&L benchmark - ColorFoil, by creating color-related foils to assess the models’ perception ability to detect colors like red, white, green, etc. We evaluate seven state-of-the-art V&L models including CLIP, ViLT, GroupViT, and BridgeTower, etc. in a zero-shot setting and present intriguing findings from the V&L models. The experimental evaluation indicates that ViLT and BridgeTower demonstrate much better color perception capabilities compared to CLIP and its variants and GroupViT. Moreover, CLIP-based models and GroupViT struggle to distinguish colors that are visually distinct to humans with normal color perception ability.

Anthology ID:: 2025.naacl-srw.29
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: April
Year:: 2025
Address:: Albuquerque, USA
Editors:: Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:: NAACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 294–300
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-srw.29/
DOI:
Bibkey:
Cite (ACL):: Ahnaf Mozib Samin, M Firoz Ahmed, and Md. Mushtaq Shahriyar Rafee. 2025. ColorFoil: Investigating Color Blindness in Large Vision and Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 294–300, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):: ColorFoil: Investigating Color Blindness in Large Vision and Language Models (Samin et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-srw.29.pdf

PDF Cite Search Fix data