Runsong Zhao
2025
Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models
Runsong Zhao
|
Xin Liu
|
Xinyu Liu
|
Pengcheng Huang
|
Chunyang Xiao
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: EMNLP 2025
Using special tokens (e.g., gist, memory, or compressed tokens) to compress context information is a common practice for large language models (LLMs). However, existing approaches often neglect that position encodings inherently induce local inductive biases in models, causing the compression process to ignore holistic contextual dependencies. We propose **Enhanced Position Layout (EPL)**, a simple yet effective method that improves the context compression capability of LLMs by only adjusting position IDs, the numerical identifiers that specify token positions. EPL minimizes the distance between context tokens and their corresponding special tokens and at the same time maintains the sequence order in position IDs between context tokens, special tokens, and the subsequent tokens. Integrating EPL into our best performing context compression model results in 1.9 ROUGE-1 F1 improvement on out-of-domain question answering datasets in average. When extended to multimodal scenarios, EPL brings an average accuracy gain of 2.6 to vision compression LLMs.
2024
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
Xinyu Liu
|
Runsong Zhao
|
Pengcheng Huang
|
Chunyang Xiao
|
Bei Li
|
Jingang Wang
|
Tong Xiao
|
JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model’s effective memory length. However, through thorough investigations, we find limitations for currently existing evaluations on model’s memory. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompt and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models.
Search
Fix author
Co-authors
- Pengcheng Huang 2
- Xinyu Liu 2
- Chunyang Xiao 2
- Tong Xiao (肖桐) 2
- Jingbo Zhu (朱靖波) 2
- show all...