Benchmarking the Fine-Grained Discriminability in Image-Text Retrieval via Controlled Contrastive Differences
Zhen Wang, Xi Zhou, Yating Yang, Bo Ma, Lei Wang, Rui Dong, Azmat Anwar, Siru Miao
Abstract
Existing cross-modal image-text retrieval models often retrieve samples with inconsistent details. To evaluate fine-grained discriminability, we introduce MSCOCO-CCD and Flickr30k-CCD, with three key features: (1) a two-level image content taxonomy for contrastive sample generation and fine-grained evaluation; (2) annotation of numerous contrastive samples, where each sample differs from the anchor by a controlled contrastive difference (CCD), with the specific type of difference labeled; (3) a fine-grained contrastive discrimination metric to assess the ability to distinguish fine-grained nuances. Extensive experiments demonstrate that contrastive samples can significantly degrade retrieval performance. Furthermore, fine-grained evaluation reveals that current models still struggle to effectively produce discriminative representations on certain feature types, such as entity emotion and scene attribute. Our datasets and related codes will be publicly released.- Anthology ID:
- 2026.findings-acl.1525
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 30491–30503
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1525/
- DOI:
- Cite (ACL):
- Zhen Wang, Xi Zhou, Yating Yang, Bo Ma, Lei Wang, Rui Dong, Azmat Anwar, and Siru Miao. 2026. Benchmarking the Fine-Grained Discriminability in Image-Text Retrieval via Controlled Contrastive Differences. In Findings of the Association for Computational Linguistics: ACL 2026, pages 30491–30503, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Benchmarking the Fine-Grained Discriminability in Image-Text Retrieval via Controlled Contrastive Differences (Wang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1525.pdf