Lin Hai
2026
GMSA: Enhancing Context Compression via Group Merging and Layer Semantic Alignment
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiwei Tang | Zhicheng Zhang | Shunlong Wu | Jingheng Ye | Lichen Bai | Zitai Wang | Tingwei Lu | Lin Hai | Yiming Zhao | Hai-Tao Zheng | Hong-Gee Kim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved remarkable performance across a wide range of Natural Language Processing (NLP) tasks. However, in long-context scenarios, they face two challenges: high computational cost and information redundancy. To address these challenges, we propose GMSA, an encoder-decoder context compression framework that generates a compact sequence of soft tokens for downstream tasks. GMSA introduces Group Merging to achieve more uniform aggregation, mitigating semantic dominance during autoencoder pretraining, and Layer Semantic Alignment (LSA) to bridge the semantic gap between high-level abstract semantics and low-level input semantics. We first pretrain GMSA as an autoencoder and then fine-tune it for downstream tasks. Experiments demonstrate that GMSA improves context reconstruction compared to existing soft prompt compression paradigm and outperforms baselines on multiple long-context question answering and summarization benchmarks across two backbone models, while maintaining low end-to-end latency.
2025
Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios
Jiwei Tang | Jin Xu | Tingwei Lu | Zhicheng Zhang | Yiming Zhao | Lin Hai | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: NAACL 2025
Jiwei Tang | Jin Xu | Tingwei Lu | Zhicheng Zhang | Yiming Zhao | Lin Hai | Hai-Tao Zheng
Findings of the Association for Computational Linguistics: NAACL 2025
Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these challenges, we present Perception Compressor, a training-free prompt compression framework. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.