Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing

Chunyu Ye, Yunhao Zhang, Jingyuan Sun, Chong Li, Yang Zhao, Shaonan Wang


Abstract
Decoding language from the human brain remains a grand challenge for Brain-Computer Interfaces (BCIs). Current approaches typically rely on unimodal brain representations, neglecting the brain’s inherently multimodal processing. Inspired by the brain’s associative mechanisms, where viewing an image can evoke related sounds and linguistic representations, we propose a unified framework that leverages Multimodal Large Language Models (MLLMs) to align brain signals with a shared semantic space encompassing text, images, and audio. A router module dynamically selects and fuses modality-specific brain features according to the characteristics of each stimulus. Experiments on various fMRI datasets with textual, visual, and auditory stimuli demonstrate state-of-the-art performance, achieving an 8.48% average improvement on the most commonly used benchmark. We further extend our framework to EEG and MEG data, demonstrating flexibility and robustness across varying temporal and spatial resolutions. To our knowledge, this is the first unified BCI architecture capable of robustly decoding multimodal brain activity across diverse brain signals and stimulus types, offering a flexible solution for real-world applications.
Anthology ID:
2026.findings-acl.1131
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22532–22546
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1131/
DOI:
Bibkey:
Cite (ACL):
Chunyu Ye, Yunhao Zhang, Jingyuan Sun, Chong Li, Yang Zhao, and Shaonan Wang. 2026. Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing. In Findings of the Association for Computational Linguistics: ACL 2026, pages 22532–22546, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Decoding the Multimodal Mind: Generalizable Brain-to-Text Translation via Multimodal Alignment and Adaptive Routing (Ye et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1131.pdf
Checklist:
 2026.findings-acl.1131.checklist.pdf