Diversity and Consistency: Exploring Visual Question-Answer Pair Generation

Sen Yang, Qingyu Zhou, Dawei Feng, Yang Liu, Chao Li, Yunbo Cao, Dongsheng Li


Abstract
Although showing promising values to downstream applications, generating question and answer together is under-explored. In this paper, we introduce a novel task that targets question-answer pair generation from visual images. It requires not only generating diverse question-answer pairs but also keeping the consistency of them. We study different generation paradigms for this task and propose three models: the pipeline model, the joint model, and the sequential model. We integrate variational inference into these models to achieve diversity and consistency. We also propose region representation scaling and attention alignment to improve the consistency further. We finally devise an evaluator as a quantitative metric for consistency. We validate our approach on two benchmarks, VQA2.0 and Visual-7w, by automatically and manually evaluating diversity and consistency. Experimental results show the effectiveness of our models: they can generate diverse or consistent pairs. Moreover, this task can be used to improve visual question generation and visual question answering.
Anthology ID:
2021.findings-emnlp.91
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1053–1066
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.91
DOI:
10.18653/v1/2021.findings-emnlp.91
Bibkey:
Cite (ACL):
Sen Yang, Qingyu Zhou, Dawei Feng, Yang Liu, Chao Li, Yunbo Cao, and Dongsheng Li. 2021. Diversity and Consistency: Exploring Visual Question-Answer Pair Generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1053–1066, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Diversity and Consistency: Exploring Visual Question-Answer Pair Generation (Yang et al., Findings 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.91.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.findings-emnlp.91.mp4
Data
VQGVisual Question Answering v2.0Visual7W