A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics

Angeline Charles, Srikant Panda, Amit Agarwal, Hitesh Laxmichand Patel, Priyaranjan Pattnayak, Bhargava Kumar, Tejaswini Kumar


Abstract
Reference-free metrics such as CLIPScore and PAC-S are increasingly used in vision-language tasks due to their scalability and independence from human-written references. However, their reliability under linguistic, visual, and cultural variation remains underexplored. In this work, we present a systematic audit of CLIPScore and PAC-S using an eight-factor diagnostic framework applied to MS-COCO validation images. Our analysis reveals consistent failure modes across dimensions including object size, content category, syntax, named entities, spatial relations and cultural context. Both metrics penalize captions referencing African (−5.5%, −4.8%) and Arabian (−4.9%, −5.3%) cultures, favor large-object and animal-centric scenes (by 20-30%) and show limited sensitivity to spatial negation and word order. CLIPScore correlates more strongly with syntactic complexity, while PAC-S demonstrates greater robustness to verbosity and named–entity variation highlighting complementary strengths rather than superiority. These findings expose cultural and content bias, weak semantic robustness, and limited compositional understanding. We conclude with design recommendations to improve fairness, scale invariance, and semantic grounding in future reference-free evaluation metrics.
Anthology ID:
2025.ijcnlp-long.142
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:
IJCNLP | AACL
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
2633–2644
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.142/
DOI:
Bibkey:
Cite (ACL):
Angeline Charles, Srikant Panda, Amit Agarwal, Hitesh Laxmichand Patel, Priyaranjan Pattnayak, Bhargava Kumar, and Tejaswini Kumar. 2025. A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 2633–2644, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics (Charles et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.142.pdf