Abstract
While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.- Anthology ID:
- 2020.emnlp-main.63
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 878–892
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.63
- DOI:
- 10.18653/v1/2020.emnlp-main.63
- Cite (ACL):
- Tejas Gokhale, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. 2020. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 878–892, Online. Association for Computational Linguistics.
- Cite (Informal):
- MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering (Gokhale et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2020.emnlp-main.63.pdf
- Code
- tejasG53/vqa_mutant + additional community code
- Data
- GQA, Visual Question Answering