Information Integration in Large Language Models is Gated by Linguistic Structural Markers

Wei Liu; Nai Ding (丁鼐)

Information Integration in Large Language Models is Gated by Linguistic Structural Markers

Abstract

Language comprehension relies on integrating information across both local words and broader context. We propose a method to quantify the information integration window of large language models (LLMs) and examine how sentence and clause boundaries constrain this window. Specifically, LLMs are required to predict a target word based on either a local window (local prediction) or the full context (global prediction), and we use Jensen-Shannon (JS) divergence to measure the information loss from relying solely on the local window, termed the local-prediction deficit. Results show that integration windows of both humans and LLMs are strongly modulated by sentence boundaries, and predictions primarily rely on words within the same sentence or clause: The local-prediction deficit follows a power-law decay as the window length increases and drops sharply at the sentence boundary. This boundary effect is primarily attributed to linguistic structural markers, e.g., punctuation, rather than implicit syntactic or semantic cues. Together, these results indicate that LLMs rely on explicit structural cues to guide their information integration strategy.

Anthology ID:: 2025.emnlp-main.351
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6903–6915
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.351/
DOI:
Bibkey:
Cite (ACL):: Wei Liu and Nai Ding. 2025. Information Integration in Large Language Models is Gated by Linguistic Structural Markers. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 6903–6915, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Information Integration in Large Language Models is Gated by Linguistic Structural Markers (Liu & Ding, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.351.pdf
Checklist:: 2025.emnlp-main.351.checklist.pdf

PDF Cite Search Checklist Fix data