Congchi Yin

2025

pdf bib abs
Rethinking Cross-Subject Data Splitting for Brain-to-Text Decoding
Congchi Yin | Qian Yu | Zhiwei Fang | Changping Peng | Piji Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent major milestones have successfully reconstructed natural language from non-invasive brain signals (e.g. functional Magnetic Resonance Imaging (fMRI) and Electroencephalogram (EEG)) across subjects. However, we find current dataset splitting strategies for cross-subject brain-to-text decoding are wrong. Specifically, we first demonstrate that all current splitting methods suffer from data leakage problem, which refers to the leakage of validation and test data into training set, resulting in significant overfitting and overestimation of decoding models. In this study, we develop a right cross-subject data splitting criterion without data leakage for decoding fMRI and EEG signal to text. Some SOTA brain-to-text decoding models are re-evaluated correctly with the proposed criterion for further research.

pdf bib abs
Improve Language Model and Brain Alignment via Associative Memory
Congchi Yin | Yongpeng Zhang | Xuyun Wen | Piji Li
Findings of the Association for Computational Linguistics: ACL 2025

Associative memory engages in the integration of relevant information for comprehension in the human cognition system. In this work, we seek to improve alignment between language models and human brain while processing speech information by integrating associative memory. After verifying the alignment between language model and brain by mapping language model activations to brain activity, the original text stimuli expanded with simulated associative memory are regarded as input to computational language models. We find the alignment between language model and brain is improved in brain regions closely related to associative memory processing. We also demonstrate large language models after specific supervised fine-tuning better align with brain response, by building the Association dataset containing 1000 samples of stories, with instructions encouraging associative memory as input and associated content as output.

Congchi Yin

2025

2024

Co-authors

Venues