Prompt Compression for Large Language Models: A Survey

Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier


Abstract
Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest. This survey provides an overview of prompt compression techniques, categorized into hard prompt methods and soft prompt methods. First, the technical approaches of these methods are compared, followed by an exploration of various ways to understand their mechanisms, including the perspectives of attention optimization, Parameter-Efficient Fine-Tuning (PEFT), modality integration, and new synthetic language. We also examine the downstream adaptations of various prompt compression techniques. Finally, the limitations of current prompt compression methods are analyzed, and several future directions are outlined, such as optimizing the compression encoder, combining hard and soft prompts methods, and leveraging insights from multimodality.
Anthology ID:
2025.naacl-long.368
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7182–7195
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.naacl-long.368/
DOI:
Bibkey:
Cite (ACL):
Zongqian Li, Yinhong Liu, Yixuan Su, and Nigel Collier. 2025. Prompt Compression for Large Language Models: A Survey. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7182–7195, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Prompt Compression for Large Language Models: A Survey (Li et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.naacl-long.368.pdf