Visio-Linguistic Brain Encoding
Subba Reddy Oota, Jashn Arora, Vijay Rowtula, Manish Gupta, Raju S. Bapi
Abstract
Brain encoding aims at reconstructing fMRI brain activity given a stimulus. There exists a plethora of neural encoding models which study brain encoding for single mode stimuli: visual (pretrained CNNs) or text (pretrained language models). Few recent papers have also obtained separate visual and text representation models and performed late-fusion using simple heuristics. However, previous work has failed to explore the co-attentive multi-modal modeling for visual and text reasoning. In this paper, we systematically explore the efficacy of image and multi-modal Transformers for brain encoding. Extensive experiments on two popular datasets, BOLD5000 and Pereira, provide the following insights. (1) We find that VisualBERT, a multi-modal Transformer, significantly outperforms previously proposed single-mode CNNs, image Transformers as well as other previously proposed multi-modal models, thereby establishing new state-of-the-art. (2) The regions such as LPTG, LMTG, LIFG, and STS which have dual functionalities for language and vision, have higher correlation with multi-modal models which reinforces the fact that these models are good at mimicing the human brain behavior. (3) The supremacy of visio-linguistic models raises the question of whether the responses elicited in the visual regions are affected implicitly by linguistic processing even when passively viewing images. Future fMRI tasks can verify this computational insight in an appropriate experimental setting. We make our code publicly available.- Anthology ID:
- 2022.coling-1.11
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 116–133
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.11
- DOI:
- Cite (ACL):
- Subba Reddy Oota, Jashn Arora, Vijay Rowtula, Manish Gupta, and Raju S. Bapi. 2022. Visio-Linguistic Brain Encoding. In Proceedings of the 29th International Conference on Computational Linguistics, pages 116–133, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Visio-Linguistic Brain Encoding (Oota et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2022.coling-1.11.pdf
- Data
- ImageNet