Automatic, Meta and Human Evaluation for Multimodal Summarization with Multimodal Output

Haojie Zhuang, Wei Emma Zhang, Leon Xie, Weitong Chen, Jian Yang, Quan Sheng


Abstract
Multimodal summarization with multimodal output (MSMO) has attracted increasing research interests recently as multimodal summary could provide more comprehensive information compared to text-only summary, effectively improving the user experience and satisfaction. As one of the most fundamental components for the development of MSMO, evaluation is an emerging yet underexplored research topic. In this paper, we fill this gap and propose a research framework that studies three research questions of MSMO evaluation: (1) Automatic Evaluation: We propose a novel metric mLLM-EVAL, which utilizes multimodal Large Language Model for MSMO EVALuation. (2) Meta-Evaluation: We create a meta-evaluation benchmark dataset by collecting human-annotated scores for multimodal summaries. With our benchmark, we conduct meta-evaluation analysis to assess the quality of different evaluation metrics and show the effectiveness of our proposed mLLM-EVAL. (3) Human Evaluation: To provide more objective and unbiased human annotations for meta-evaluation, we hypothesize and verify three types of cognitive biases in human evaluation. We also incorporate our findings into the human annotation process in the meta-evaluation benchmark. Overall, our research framework provides an evaluation metric, a meta-evaluation benchmark dataset annotated by humans and an analysis of cognitive biases in human evaluation, which we believe would serve as a valuable and comprehensive resource for the MSMO research community.
Anthology ID:
2024.naacl-long.430
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7768–7790
Language:
URL:
https://aclanthology.org/2024.naacl-long.430
DOI:
10.18653/v1/2024.naacl-long.430
Bibkey:
Cite (ACL):
Haojie Zhuang, Wei Emma Zhang, Leon Xie, Weitong Chen, Jian Yang, and Quan Sheng. 2024. Automatic, Meta and Human Evaluation for Multimodal Summarization with Multimodal Output. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7768–7790, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Automatic, Meta and Human Evaluation for Multimodal Summarization with Multimodal Output (Zhuang et al., NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2024.naacl-long.430.pdf