Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

Lele Cao; Valentin Buchner; Zineb Senane; Fangkai Yang

doi:10.18653/v1/2024.trustnlp-1.16

Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

Lele Cao, Valentin Buchner, Zineb Senane, Fangkai Yang

Abstract

Multimodal Large Language Models (MLLMs) are commonly evaluated using costly annotated multimodal benchmarks. However, these benchmarks often struggle to keep pace with the rapidly advancing requirements of MLLM evaluation. We propose GenCeption, a novel and annotation-free MLLM evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models’ inclination to hallucinate. Analogous to the popular DrawCeption game, GenCeption initiates with a non-textual sample and undergoes a series of iterative description and generation steps. Semantic drift across iterations is quantified using the GC@T metric. Our empirical findings validate GenCeption’s efficacy, showing strong correlations with popular MLLM benchmarking results. GenCeption may be extended to mitigate training data contamination by utilizing ubiquitous, previously unseen unimodal data.

Anthology ID:: 2024.trustnlp-1.16
Volume:: Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Anaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 196–201
Language:
URL:: https://aclanthology.org/2024.trustnlp-1.16
DOI:: 10.18653/v1/2024.trustnlp-1.16
Bibkey:
Cite (ACL):: Lele Cao, Valentin Buchner, Zineb Senane, and Fangkai Yang. 2024. Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations. In Proceedings of the 4th Workshop on Trustworthy Natural Language Processing (TrustNLP 2024), pages 196–201, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations (Cao et al., TrustNLP-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl-24-ws-corrections/2024.trustnlp-1.16.pdf
Supplementary material:: 2024.trustnlp-1.16.SupplementaryMaterial.zip

PDF Search Supplementary material