Raju Bapi
Also published as: Bapi Raju Surampudi
2025
Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
Padakanti Srijith
|
Khushbu Pahwa
|
Radhika Mamidi
|
Bapi Raju Surampudi
|
Manish Gupta
|
Subba Reddy Oota
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Although speech language models are expected to align well with brain language processing during speech comprehension, recent studies have found that they fail to capture brain-relevant semantics beyond low-level features. Surprisingly, text-based language models exhibit stronger alignment with brain language regions, as they better capture brain-relevant semantics. However, no prior work has examined the alignment effectiveness of text/speech representations from multimodal models. This raises several key questions: Can speech embeddings from such multimodal models capture brain-relevant semantics through cross-modal interactions? Which modality can take advantage of this synergistic multimodal understanding to improve alignment with brain language processing? Can text/speech representations from such multimodal models outperform unimodal models? To address these questions, we systematically analyze multiple multimodal models, extracting both text- and speech-based representations to assess their alignment with MEG brain recordings during naturalistic story listening. We find that text embeddings from both multimodal and unimodal models significantly outperform speech embeddings from these models. Specifically, multimodal text embeddings exhibit a peak around 200 ms, suggesting that they benefit from speech embeddings, with heightened activity during this time period. However, speech embeddings from these multimodal models still show a similar alignment compared to their unimodal counterparts, suggesting that they do not gain meaningful semantic benefits over text-based representations. These results highlight an asymmetry in cross-modal knowledge transfer, where the text modality benefits more from speech information, but not vice versa.
2023
How does the brain process syntactic structure while listening?
Subba Reddy Oota
|
Mounika Marreddy
|
Manish Gupta
|
Raju Bapi
Findings of the Association for Computational Linguistics: ACL 2023
Syntactic parsing is the task of assigning a syntactic structure to a sentence. There are two popular syntactic parsing methods: constituency and dependency parsing. Recent works have used syntactic embeddings based on constituency trees, incremental top-down parsing, and other word syntactic features for brain activity prediction given the text stimuli to study how the syntax structure is represented in the brain’s language network. However, the effectiveness of dependency parse trees or the relative predictive power of the various syntax parsers across brain areas, especially for the listening task, is yet unexplored. In this study, we investigate the predictive power of the brain encoding models in three settings: (i) individual performance of the constituency and dependency syntactic parsing based embedding methods, (ii) efficacy of these syntactic parsing based embedding methods when controlling for basic syntactic signals, (iii) relative effectiveness of each of the syntactic embedding methods when controlling for the other. Further, we explore the relative importance of syntactic information (from these syntactic embedding methods) versus semantic information using BERT embeddings. We find that constituency parsers help explain activations in the temporal lobe and middle-frontal gyrus, while dependency parsers better encode syntactic structure in the angular gyrus and posterior cingulate cortex. Although semantic signals from BERT are more effective compared to any of the syntactic features or embedding methods, syntactic embedding methods explain additional variance for a few brain regions.
Search
Fix author
Co-authors
- Manish Gupta 2
- Subba Reddy Oota 2
- Radhika Mamidi 1
- Mounika Marreddy 1
- Khushbu Pahwa 1
- show all...