Multi-VQG: Generating Engaging Questions for Multiple Images

Min-Hsuan Yeh; Vincent Chen; Ting-Hao Huang; Lun-Wei Ku

doi:10.18653/v1/2022.emnlp-main.19

Multi-VQG: Generating Engaging Questions for Multiple Images

Min-Hsuan Yeh, Vincent Chen, Ting-Hao Huang, Lun-Wei Ku

Abstract

Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals’ willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models togenerate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications.

Anthology ID:: 2022.emnlp-main.19
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 277–290
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.emnlp-main.19/
DOI:: 10.18653/v1/2022.emnlp-main.19
Bibkey:
Cite (ACL):: Min-Hsuan Yeh, Vincent Chen, Ting-Hao Huang, and Lun-Wei Ku. 2022. Multi-VQG: Generating Engaging Questions for Multiple Images. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 277–290, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Multi-VQG: Generating Engaging Questions for Multiple Images (Yeh et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.emnlp-main.19.pdf

PDF Cite Search Fix data