PRISM: A New Lens for Improved Color Understanding
Arjun Reddy Akula, Garima Pruthi, Inderjit S Dhillon, Pradyumna Narayana, Sugato Basu, Varun Jampani
Abstract
While image-text pre-trained models, such as CLIP, have demonstrated impressive capabilities in learning robust text and image representations, a critical area for substantial improvement remains—precise color understanding. In this paper, we address this limitation by introducing PRISM, a simple yet highly effective method that extends CLIP’s capability to grasp the nuances of precise colors. PRISM seamlessly adapts to both recognized HTML colors and out-of-vocabulary RGB inputs through the utilization of our curated dataset of 100 image-text pairs, which can be effortlessly repurposed for fine-tuning with any desired color. Importantly, PRISM achieves these enhancements without compromising CLIP’s performance on established benchmarks. Furthermore, we introduce a novel evaluation framework, ColorLens, featuring both seen and unseen test sets that can be readily repurposed to assess a model’s precision in understanding precise colors. Our comprehensive evaluation and results demonstrate significant improvements over baseline models.- Anthology ID:
- 2024.emnlp-industry.121
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, US
- Editors:
- Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1659–1670
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/2024.emnlp-industry.121/
- DOI:
- 10.18653/v1/2024.emnlp-industry.121
- Cite (ACL):
- Arjun Reddy Akula, Garima Pruthi, Inderjit S Dhillon, Pradyumna Narayana, Sugato Basu, and Varun Jampani. 2024. PRISM: A New Lens for Improved Color Understanding. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1659–1670, Miami, Florida, US. Association for Computational Linguistics.
- Cite (Informal):
- PRISM: A New Lens for Improved Color Understanding (Akula et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/2024.emnlp-industry.121.pdf