@inproceedings{srinivasan-bisk-2022-worst,
    title = "Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models",
    author = "Srinivasan, Tejas  and
      Bisk, Yonatan",
    editor = "Hardmeier, Christian  and
      Basta, Christine  and
      Costa-juss{\`a}, Marta R.  and
      Stanovsky, Gabriel  and
      Gonen, Hila",
    booktitle = "Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)",
    month = jul,
    year = "2022",
    address = "Seattle, Washington",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.gebnlp-1.10/",
    doi = "10.18653/v1/2022.gebnlp-1.10",
    pages = "77--85",
    abstract = "Numerous works have analyzed biases in vision and pre-trained language models individually - however, less attention has been paid to how these biases interact in multimodal settings. This work extends text-based bias analysis methods to investigate multimodal language models, and analyzes intra- and inter-modality associations and biases learned by these models. Specifically, we demonstrate that VL-BERT (Su et al., 2020) exhibits gender biases, often preferring to reinforce a stereotype over faithfully describing the visual scene. We demonstrate these findings on a controlled case-study and extend them for a larger set of stereotypically gendered entities."
}Markdown (Informal)
[Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models](https://preview.aclanthology.org/ingest-emnlp/2022.gebnlp-1.10/) (Srinivasan & Bisk, GeBNLP 2022)
ACL