Jiwei Tang
2026
Read As Human: Compressing Context via Parallelizable Close Reading and Skimming
Jiwei Tang | Shilei Liu | Zhicheng Zhang | Qingsong Lv | Runsong Zhao | Tingwei Lu | Langming Liu | Haibin Chen | Yujin Yuan | Hai-Tao Zheng | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiwei Tang | Shilei Liu | Zhicheng Zhang | Qingsong Lv | Runsong Zhao | Tingwei Lu | Langming Liu | Haibin Chen | Yujin Yuan | Hai-Tao Zheng | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) demonstrate exceptional capability across diverse tasks. However, their deployment in long-context scenarios is hindered by two challenges: computational inefficiency and redundant information. We propose RAM (Read As HuMan), a context compression framework that adopts an adaptive hybrid reading strategy, to address these challenges. Inspired by human reading behavior (i.e., close reading important content while skimming less relevant content), RAM partitions the context into segments and encodes them with the input query in parallel. High-relevance segments are fully retained (close reading), while low-relevance ones are query-guided compressed into compact summary vectors (skimming). Both explicit textual segments and implicit summary vectors are concatenated and fed into decoder to achieve both superior performance and natural language format interpretability. To refine the decision boundary between close reading and skimming, we further introduce a contrastive learning objective based on positive and negative query–segment pairs. Experiments demonstrate that RAM outperforms existing baselines on multiple question answering and summarization benchmarks across two backbones, while delivering up to a 12x end-to-end speedup on long inputs (average length 16K; maximum length 32K).
CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
Runsong Zhao | Shilei Liu | Jiwei Tang | Langming Liu | Haibin Chen | Weidong Zhang | Yujin Yuan | Tong Xiao | JingBo Zhu | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Runsong Zhao | Shilei Liu | Jiwei Tang | Langming Liu | Haibin Chen | Weidong Zhang | Yujin Yuan | Tong Xiao | JingBo Zhu | Wenbo Su | Bo Zheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The quadratic complexity and indefinitely growing key-value (KV) cache of standard Transformers pose a major barrier to long-context processing. To overcome this, we introduce the **Co**llaborative **Me**mory **T**ransformer (CoMeT), a novel architecture that enables LLMs to handle arbitrarily long sequences with constant memory usage and linear time complexity. Designed as an efficient, plug-in module, CoMeT can be integrated into pre-trained models with only minimal fine-tuning. It operates on sequential data chunks, using a dual-memory system to manage context: a temporary memory on a FIFO queue for recent events, and a global memory with a gated update rule for long-range dependencies. These memories then act as a dynamic soft prompt for the next chunk. The effectiveness of our approach is remarkable: a model equipped with CoMeT and fine-tuned on 32k contexts can accurately retrieve a passkey from any position within a 1M token sequence. On the SCROLLS benchmark, CoMeT surpasses other efficient methods and achieves performance comparable to a full-attention baseline on summarization tasks. Its practical effectiveness is further validated on real-world agent and user behavior QA tasks, supported by a novel layer-level pipeline parallel training strategy that enables fine-tuning on extremely long contexts. The code is available at: https://github.com/LivingFutureLab/Comet
GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved remarkable performance across a wide range of Natural Language Processing (NLP) tasks. However, in long-context scenarios, they face two challenges: high computational cost and information redundancy. To address these challenges, we propose GMSA, an encoder-decoder context compression framework that generates a compact sequence of soft tokens for downstream tasks. GMSA introduces Group Merging to achieve more uniform aggregation, mitigating semantic dominance during autoencoder pretraining, and Layer Semantic Alignment (LSA) to bridge the semantic gap between high-level abstract semantics and low-level input semantics. We first pretrain GMSA as an autoencoder and then fine-tune it for downstream tasks. Experiments demonstrate that GMSA improves context reconstruction compared to existing soft prompt compression paradigm and outperforms baselines on multiple long-context question answering and summarization benchmarks across two backbone models, while maintaining low end-to-end latency.
2025
DAST: Context-Aware Compression in LLMs via Dynamic Allocation of Soft Tokens
Shaoshen Chen | Yangning Li | Zishan Xu | Yongqin Zeng | Shunlong Wu | Xinshuo Hu | Zifei Shan | Xin Su | Jiwei Tang | Yinghui Li | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: ACL 2025
Shaoshen Chen | Yangning Li | Zishan Xu | Yongqin Zeng | Shunlong Wu | Xinshuo Hu | Zifei Shan | Xin Su | Jiwei Tang | Yinghui Li | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) face computational inefficiencies and redundant processing when handling long context inputs, prompting a focus on compression techniques. While existing semantic vector-based compression methods achieve promising performance, these methods fail to account for the intrinsic information density variations between context chunks, instead allocating soft tokens uniformly across context chunks. This uniform distribution inevitably diminishes allocation to information-critical regions. To address this, we propose Dynamic Allocation of Soft Tokens (DAST), a simple yet effective method that leverages the LLM’s intrinsic understanding of contextual relevance to guide compression. DAST combines perplexity-based local information with attention-driven global information to dynamically allocate soft tokens to the informative-rich chunks, enabling effective, context-aware compression. Experimental results across multiple benchmarks demonstrate that DAST surpasses state-of-the-art methods.
Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios
Jiwei Tang | Jin Xu | Tingwei Lu | Zhicheng Zhang | Yiming Zhao | Lin Hai | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: NAACL 2025
Jiwei Tang | Jin Xu | Tingwei Lu | Zhicheng Zhang | Yiming Zhao | Lin Hai | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: NAACL 2025
Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these challenges, we present Perception Compressor, a training-free prompt compression framework. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
Qingsong Lv | Yangning Li | Zihua Lan | Zishan Xu | Jiwei Tang | Tingwei Lu | Yinghui Li | Wenhao Jiang | Hong-Gee Kim | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Qingsong Lv | Yangning Li | Zihua Lan | Zishan Xu | Jiwei Tang | Tingwei Lu | Yinghui Li | Wenhao Jiang | Hong-Gee Kim | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Instruction tuning of large language models (LLMs) benefits more from a handful of high-quality examples than from hordes of low-quality ones. Existing selection methods typically rely on static, heuristic quality scores and are executed only once before training. Consequently, they neither adapt to the changing state of the model nor target downstream objectives, leaving substantial room for optimization. We propose RAISE (**R**einforced **A**daptive **I**nstruction **SE**lection), a *dynamic*, *task-driven* framework that integrates selection into every training step. At each step, RAISE estimates the expected contribution of each candidate instruction to task performance and admits only the most helpful. By modeling this process as sequential decision making, we optimize the selector with reinforcement learning, yielding an interpretable policy specialized for the target task. Extensive experiments show that RAISE reaches comparable or better results than full-data training while updating only 1% of the steps, demonstrating both high efficacy and significant computational savings.
Search
Fix author
Co-authors
- Hai-Tao Zheng 5
- Tingwei Lu 4
- Zhicheng Zhang 3
- Haibin Chen 2
- Lin Hai 2
- Hong-Gee Kim 2
- Yangning Li 2
- Yinghui Li 2
- Shilei Liu 2
- Langming Liu 2
- Qingsong Lv 2
- Wenbo Su 2
- Shunlong Wu 2
- Zishan Xu 2
- Yujin Yuan 2
- Runsong Zhao 2
- Yiming Zhao 2
- Bo Zheng 2
- Lichen Bai 1
- Shaoshen Chen 1
- Xinshuo Hu 1
- Wenhao Jiang 1
- Zihua Lan 1
- Zifei Shan 1
- Xin Su 1
- Zitai Wang 1
- Tong Xiao (肖桐) 1
- Jin Xu 1
- Jingheng Ye 1
- Philip S. Yu 1
- Yongqin Zeng 1
- Weidong Zhang 1
- JingBo Zhu (朱靖波) 1