ProCut: LLM Prompt Compression via Attribution Estimation

Zhentao Xu; Fengyi Li; Albert C. Chen; Xiaofeng Wang

ProCut: LLM Prompt Compression via Attribution Estimation

Zhentao Xu, Fengyi Li, Albert C. Chen, Xiaofeng Wang

Abstract

In large-scale industrial LLM systems, prompt templates often expand to thousands of tokens as teams iteratively incorporate sections such as task instructions, few-shot examples, and heuristic rules to enhance robustness and coverage. This expansion leads to bloated prompts that are difficult to maintain and incur significant inference latency and serving costs. To address this, we introduce Prompt Compression via Attribution Estimation (ProCut), a flexible, LLM-agnostic, training-free framework that compresses prompts through attribution analysis. ProCut segments prompt templates into semantically meaningful units, quantifies their impact on task performance, and prunes low-utility components. Through extensive experiments on five public benchmark datasets and real-world industrial prompts, we show that ProCut achieves substantial prompt size reductions (78% fewer tokens in production) while maintaining or even slightly improving task performance (up to 62% better than alternative methods). We further introduce an LLM-driven attribution estimator that reduces compression latency by over 50%, and demonstrate that ProCut integrates seamlessly with existing prompt-optimization frameworks to produce concise, high-performing prompts.

Anthology ID:: 2025.emnlp-industry.20
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 285–309
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.20/
DOI:
Bibkey:
Cite (ACL):: Zhentao Xu, Fengyi Li, Albert C. Chen, and Xiaofeng Wang. 2025. ProCut: LLM Prompt Compression via Attribution Estimation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 285–309, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: ProCut: LLM Prompt Compression via Attribution Estimation (Xu et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.20.pdf

PDF Cite Search Fix data