Congchi Yin
2026
Language Reconstruction with Brain Predictive Coding from fMRI Data
Congchi Yin | Ziyi Ye | Piji Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Congchi Yin | Ziyi Ye | Piji Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Many recent studies have shown that the perception of speech can be decoded from brain signals and subsequently reconstructed as continuous language. However, there is a lack of neurological basis for how the semantic information embedded within brain signals can be used more effectively to guide language reconstruction. Predictive coding theory suggests the human brain naturally engages in continuously predicting future words that span multiple timescales. This implies that the decoding of brain signals could potentially be associated with a predictable future. To explore the predictive coding theory within the context of language reconstruction, this paper proposes PredFT (FMRI-to-Text decoding with Predictive coding). PredFT consists of a main network and a side network. The side network obtains brain predictive representation from related regions of interest (ROIs) with a self-attention module. The representation is then fused into the main network for continuous language decoding. Experiments on two naturalistic language comprehension fMRI datasets show that PredFT outperforms current decoding models on several evaluation metrics.
2025
Rethinking Cross-Subject Data Splitting for Brain-to-Text Decoding
Congchi Yin | Qian Yu | Zhiwei Fang | Changping Peng | Piji Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Congchi Yin | Qian Yu | Zhiwei Fang | Changping Peng | Piji Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent major milestones have successfully reconstructed natural language from non-invasive brain signals (e.g. functional Magnetic Resonance Imaging (fMRI) and Electroencephalogram (EEG)) across subjects. However, we find current dataset splitting strategies for cross-subject brain-to-text decoding are wrong. Specifically, we first demonstrate that all current splitting methods suffer from data leakage problem, which refers to the leakage of validation and test data into training set, resulting in significant overfitting and overestimation of decoding models. In this study, we develop a right cross-subject data splitting criterion without data leakage for decoding fMRI and EEG signal to text. Some SOTA brain-to-text decoding models are re-evaluated correctly with the proposed criterion for further research.
Improve Language Model and Brain Alignment via Associative Memory
Congchi Yin | Yongpeng Zhang | Xuyun Wen | Piji Li
Findings of the Association for Computational Linguistics: ACL 2025
Congchi Yin | Yongpeng Zhang | Xuyun Wen | Piji Li
Findings of the Association for Computational Linguistics: ACL 2025
Associative memory engages in the integration of relevant information for comprehension in the human cognition system. In this work, we seek to improve alignment between language models and human brain while processing speech information by integrating associative memory. After verifying the alignment between language model and brain by mapping language model activations to brain activity, the original text stimuli expanded with simulated associative memory are regarded as input to computational language models. We find the alignment between language model and brain is improved in brain regions closely related to associative memory processing. We also demonstrate large language models after specific supervised fine-tuning better align with brain response, by building the Association dataset containing 1000 samples of stories, with instructions encouraging associative memory as input and associated content as output.