Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models

Tejas Srinivasan; Yonatan Bisk

doi:10.18653/v1/2022.gebnlp-1.10

Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models

Abstract

Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings. This work extends text-based bias analysis methods to investigate multimodal language models, and analyzes intra- and inter-modality associations and biases learned by these models. Specifically, we demonstrate that VL-BERT (Su et al., 2020) exhibits gender biases, often preferring to reinforce a stereotype over faithfully describing the visual scene. We demonstrate these findings on a controlled case-study and extend them for a larger set of stereotypically gendered entities.

Anthology ID:: 2022.gebnlp-1.10
Volume:: Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: July
Year:: 2022
Address:: Seattle, Washington
Editors:: Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, Hila Gonen
Venue:: GeBNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 77–85
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2022.gebnlp-1.10/
DOI:: 10.18653/v1/2022.gebnlp-1.10
Bibkey:
Cite (ACL):: Tejas Srinivasan and Yonatan Bisk. 2022. Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 77–85, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):: Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models (Srinivasan & Bisk, GeBNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2022.gebnlp-1.10.pdf
Video:: https://preview.aclanthology.org/ingest-emnlp/2022.gebnlp-1.10.mp4

PDF Cite Search Video Fix data