Can Pre-training help VQA with Lexical Variations?

Shailza Jolly, Shubham Kapoor


Abstract
Rephrasings or paraphrases are sentences with similar meanings expressed in different ways. Visual Question Answering (VQA) models are closing the gap with the oracle performance for datasets like VQA2.0. However, these models fail to perform well on rephrasings of a question, which raises some important questions like Are these models robust towards linguistic variations? Is it the architecture or the dataset that we need to optimize? In this paper, we analyzed VQA models in the space of paraphrasing. We explored the role of language & cross-modal pre-training to investigate the robustness of VQA models towards lexical variations. Our experiments find that pre-trained language encoders generate efficient representations of question rephrasings, which help VQA models correctly infer these samples. We empirically determine why pre-training language encoders improve lexical robustness. Finally, we observe that although pre-training all VQA components obtain state-of-the-art results on the VQA-Rephrasings dataset, it still fails to completely close the performance gap between original and rephrasing validation splits.
Anthology ID:
2020.findings-emnlp.257
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2863–2868
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.257
DOI:
10.18653/v1/2020.findings-emnlp.257
Bibkey:
Cite (ACL):
Shailza Jolly and Shubham Kapoor. 2020. Can Pre-training help VQA with Lexical Variations?. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2863–2868, Online. Association for Computational Linguistics.
Cite (Informal):
Can Pre-training help VQA with Lexical Variations? (Jolly & Kapoor, Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.findings-emnlp.257.pdf
Optional supplementary material:
 2020.findings-emnlp.257.OptionalSupplementaryMaterial.pdf