VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis

Shubhashis Roy Dipta; Tz-Ying Wu; Subarna Tripathi

VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis

Shubhashis Roy Dipta, Tz-Ying Wu, Subarna Tripathi

Abstract

We propose VC-Inspector, a lightweight, open-source large multimodal model (LMM) for reference-free evaluation of video captions, with a focus on factual accuracy. Unlike existing metrics that suffer from limited context handling, weak factuality assessment, or reliance on proprietary services, VC-Inspector offers a reproducible, fact-aware alternative that aligns closely with human judgments. To enable robust training and interpretable evaluation, we introduce a systematic approach for generating captions with controllable errors, paired with graded quality scores and explanatory annotations. Experiments show that VC-Inspector achieves state-of-the-art correlation with human judgments, generalizing across diverse domains (e.g., VATEX-Eval, Flickr8K-Expert, and Flickr8K-CF benchmarks) and revealing the potential for caption improvement.

Anthology ID:: 2026.acl-long.1552
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33657–33672
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1552/
DOI:
Bibkey:
Cite (ACL):: Shubhashis Roy Dipta, Tz-Ying Wu, and Subarna Tripathi. 2026. VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33657–33672, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis (Dipta et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1552.pdf
Checklist:: 2026.acl-long.1552.checklist.pdf

PDF Cite Search Checklist Fix data