QACE: Asking Questions to Evaluate an Image Caption

Hwanhee Lee, Thomas Scialom, Seunghyun Yoon, Franck Dernoncourt, Kyomin Jung


Abstract
In this paper we propose QACE, a new metric based on Question Answering for Caption Evaluation to evaluate image captioning based on Question Generation(QG) and Question Answering(QA) systems. QACE generates questions on the evaluated caption and check its content by asking the questions on either the reference caption or the source image. We first develop QACE_Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE_Img, that asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE_Img. Unfortunately, the standard VQA models are actually framed a classification among only few thousands categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE_Img is multi-modal, reference-less and explainable. Our experiments show that QACE_Img compares favorably w.r.t. other reference-less metrics.
Anthology ID:
2021.findings-emnlp.395
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4631–4638
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.395
DOI:
10.18653/v1/2021.findings-emnlp.395
Bibkey:
Cite (ACL):
Hwanhee Lee, Thomas Scialom, Seunghyun Yoon, Franck Dernoncourt, and Kyomin Jung. 2021. QACE: Asking Questions to Evaluate an Image Caption. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4631–4638, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
QACE: Asking Questions to Evaluate an Image Caption (Lee et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.395.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2021.findings-emnlp.395.mp4
Code
 hwanheelee1993/qace
Data
SQuADVisual Question Answering