Semantically Distributed Robust Optimization for Vision-and-Language Inference

Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang


Abstract
Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored.In this paper, we present SDRO, a model-agnostic method that utilizes a set linguistic transformations in a distributed robust optimization setting, along with an ensembling technique to leverage these transformations during inference.Experiments on benchmark datasets with images (NLVR2) and video (VIOLIN) demonstrate performance improvements as well as robustness to adversarial attacks.Experiments on binary VQA explore the generalizability of this method to other V&L tasks.
Anthology ID:
2022.findings-acl.118
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1493–1513
Language:
URL:
https://aclanthology.org/2022.findings-acl.118
DOI:
10.18653/v1/2022.findings-acl.118
Bibkey:
Cite (ACL):
Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. 2022. Semantically Distributed Robust Optimization for Vision-and-Language Inference. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1493–1513, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Semantically Distributed Robust Optimization for Vision-and-Language Inference (Gokhale et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.findings-acl.118.pdf
Code
 asu-apg/vli_sdro
Data
Violin