Abstract
A method for creating a vision-and-language (V&L) model is to extend a language model through structural modifications and V&L pre-training. Such an extension aims to make a V&L model inherit the capability of natural language understanding (NLU) from the original language model. To see how well this is achieved, we propose to evaluate V&L models using an NLU benchmark (GLUE). We compare five V&L models, including single-stream and dual-stream models, trained with the same pre-training. Dual-stream models, with their higher modality independence achieved by approximately doubling the number of parameters, are expected to preserve the NLU capability better. Our main finding is that the dual-stream scores are not much different than the single-stream scores, contrary to expectation. Further analysis shows that pre-training causes the performance drop in NLU tasks with few exceptions. These results suggest that adopting a single-stream structure and devising the pre-training could be an effective method for improving the maintenance of language knowledge in V&L extensions.- Anthology ID:
- 2021.emnlp-main.167
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2189–2196
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.167
- DOI:
- 10.18653/v1/2021.emnlp-main.167
- Cite (ACL):
- Taichi Iki and Akiko Aizawa. 2021. Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2189–2196, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language Models (Iki & Aizawa, EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.emnlp-main.167.pdf
- Code
- alab-nii/eval_vl_glue
- Data
- CoLA, GLUE, MRPC, QNLI, SST