Do Large Language Models Acquire Phrase-Based Processing? Evidence from Eye Movements and Model-Brain Alignment After Fine-Tuning

Xufeng Duan; Zhengwu Ma; Zhaoqian Yao; Jixing Li; Zhenguang Cai

Do Large Language Models Acquire Phrase-Based Processing? Evidence from Eye Movements and Model-Brain Alignment After Fine-Tuning

Xufeng Duan, Zhengwu Ma, Zhaoqian Yao, Jixing Li, Zhenguang Cai

Abstract

Autoregressive large language models (LLMs) process text token-by-token, yet the human language system operates over multi-word units. We ask whether aggregating LLM representations at the phrase level yields a closer correspondence to human reading behavior and language cortex than the default word-level representations, and whether phrase-segmentation fine-tuning amplifies this correspondence. Using Meta-Llama-3.1-8B (base and fine-tuned), we provide three converging lines of evidence. First, phrase-level attention features predict regressive eye-saccade patterns more closely than word-level features; a partial correlation analysis with a shuffled-boundary control indicates that this is not solely an aggregation artifact and that linguistic chunk boundaries explain unique variance beyond word-level attention. Second, fMRI encoding analyses show that fine-tuning selectively improves phrase encoding in left superior temporal gyrus and inferior frontal gyrus, with no improvement for word representations. Third, representational similarity analysis confirms a phrase-specific gain in model-brain geometric alignment. These results identify phrase-level representation as a critical granularity for LLM–human correspondence and suggest that targeted training can model human-like compositional processing, linking computational representations to hierarchical theories of language.

Anthology ID:: 2026.scil-main.44
Volume:: Proceedings of the Society for Computation in Linguistics 2026
Month:: July
Year:: 2026
Address:: San Diego, CA
Editors:: Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:: SCiL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 464–476
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.44/
DOI:
Bibkey:
Cite (ACL):: Xufeng Duan, Zhengwu Ma, Zhaoqian Yao, Jixing Li, and Zhenguang Cai. 2026. Do Large Language Models Acquire Phrase-Based Processing? Evidence from Eye Movements and Model-Brain Alignment After Fine-Tuning. In Proceedings of the Society for Computation in Linguistics 2026, pages 464–476, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):: Do Large Language Models Acquire Phrase-Based Processing? Evidence from Eye Movements and Model-Brain Alignment After Fine-Tuning (Duan et al., SCiL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.44.pdf

PDF Cite Search Fix data