The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

Xinyi Chen; Raquel Fernández; Sandro Pezzelle

doi:10.18653/v1/2023.emnlp-main.356

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

Xinyi Chen, Raquel Fernández, Sandro Pezzelle

Abstract

Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions—active-passive voice, coordination, and relative clauses—that even preschool children can typically master. We present BLA, a novel, automatically constructed benchmark to evaluate multimodal models on these Basic Language Abilities. We show that different types of Transformer-based systems, such as CLIP, ViLBERT, and BLIP2, generally struggle with BLA in a zero-shot setting, in line with previous findings. Our experiments, in particular, show that most of the tested models only marginally benefit when fine-tuned or prompted with construction-specific samples. Yet, the generative BLIP2 shows promising trends, especially in an in-context learning setting. This opens the door to using BLA not only as an evaluation benchmark but also to improve models’ basic language abilities.

Anthology ID:: 2023.emnlp-main.356
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5817–5830
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2023.emnlp-main.356/
DOI:: 10.18653/v1/2023.emnlp-main.356
Bibkey:
Cite (ACL):: Xinyi Chen, Raquel Fernández, and Sandro Pezzelle. 2023. The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5817–5830, Singapore. Association for Computational Linguistics.
Cite (Informal):: The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models (Chen et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2023.emnlp-main.356.pdf
Video:: https://preview.aclanthology.org/add_missing_videos/2023.emnlp-main.356.mp4

PDF Search Video Fix data