CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

Arjun Akula; Soravit Changpinyo; Boqing Gong; Piyush Sharma; Song-chun Zhu; Radu Soricut

doi:10.18653/v1/2021.emnlp-main.164

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

Abstract

One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role. In this paper, we propose a semi-automatic framework for generating disentangled shifts by introducing a controllable visual question-answer generation (VQAG) module that is capable of generating highly-relevant and diverse question-answer pairs with the desired dataset style. We use it to create CrossVQA, a collection of test splits for assessing VQA generalization based on the VQA2, VizWiz, and Open Images datasets. We provide an analysis of our generated datasets and demonstrate its utility by using them to evaluate several state-of-the-art VQA systems. One important finding is that the visual shifts in cross-dataset VQA matter more than the language shifts. More broadly, we present a scalable framework for systematically evaluating the machine with little human intervention.

Anthology ID:: 2021.emnlp-main.164
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2148–2166
Language:
URL:: https://aclanthology.org/2021.emnlp-main.164
DOI:: 10.18653/v1/2021.emnlp-main.164
Bibkey:
Cite (ACL):: Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, and Radu Soricut. 2021. CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2148–2166, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization (Akula et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2021.emnlp-main.164.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-5/2021.emnlp-main.164.mp4
Data: GQA, MS COCO, VCR, VQG, Visual Question Answering, Visual Question Answering v2.0, VizWiz

PDF Search Video