How much do contextualized representations encode long-range context?

Simeng Sun; Cheng-Ping Hsieh

How much do contextualized representations encode long-range context?

Abstract

We analyze contextual representations in neural autoregressive language models, emphasizing long-range contexts that span several thousand tokens. Our methodology employs a perturbation setup and the metric Anisotropy-Calibrated Cosine Similarity, to capture the degree of contextualization of long-range patterns from the perspective of representation geometry. We begin the analysis with a case study on standard decoder-only Transformers, demonstrating that similar perplexity can exhibit markedly different downstream task performance, which can be explained by the difference in contextualization of long-range content. Next, we extend the analysis to other models, covering recent novel architectural designs and various training configurations. The representation-level results illustrate a reduced capacity for high-complexity (i.e., less compressible) sequences across architectures, and that fully recurrent models rely heavily on local context, whereas hybrid models more effectively encode the entire sequence structure. Finally, preliminary analysis of model size and training configurations on the encoding of long-range context suggest potential directions for improving existing language models.

Anthology ID:: 2025.findings-naacl.90
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1662–1679
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.90/
DOI:
Bibkey:
Cite (ACL):: Simeng Sun and Cheng-Ping Hsieh. 2025. How much do contextualized representations encode long-range context?. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1662–1679, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: How much do contextualized representations encode long-range context? (Sun & Hsieh, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.90.pdf

PDF Cite Search Fix data