Huibin Wang

2026

Large Language Models exhibit degraded performance when extrapolating beyond training context lengths. Existing training-free methods like positional reuse or interpolation can alleviate this issue in an efficient manner. However, these strategies are semantics-agnostic by only considering relative token distances, which could indiscriminately blur semantically relevant and irrelevant tokens alike.To address this, we introduce an adaptive positional zooming method called **Relevance-Informed Positional Resource Allocation (RiPRA)**. RiPRA formulates positional encoding as a constrained resource allocation, in which a fixed positional budget is distributed across tokens in a longer context based on their semantic relevance to the query: relevant tokens get higher positional resolution, while irrelevant tokens (positions) are compressed. By doing this, RiPRA enables a dynamic and nonparametric positional zooming where the positional resolution is adaptively modulated across queries and network layers, effectively improving long-range context modeling and retrieval capacity. Besides, an isotonic smoothing is used to further enforce a global linear ordering relationship to preserve stability and generalization, together with a chunk-based hierarchical approximation to further reduce inference overhead. Extensive experiments across comprehensive benchmarks including LongBench, L-Eval, Passkey Retrieval, and PG19 demonstrate that RiPRA consistently outperforms existing training-free extrapolation methods, showing the value of relevance-conditioned positional encoding for long-context generalization.

Co-authors

Bin Tang 1

Kai Zhang 1

Hongbo Zhao 1

Venues

Findings1

Fix author