CopySpec: Accelerating LLMs with Speculative Copy-and-Paste

Razvan-Gabriel Dumitru; Minglai Yang; Vikas Yadav; Mihai Surdeanu

CopySpec: Accelerating LLMs with Speculative Copy-and-Paste

Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, Mihai Surdeanu

Abstract

We introduce CopySpec, a simple yet effective technique to tackle the inefficiencies LLMs face when generating responses that closely resemble previous outputs or responses that can be verbatim extracted from context. CopySpec identifies repeated sequences in the model’s chat history or context and speculates that the same tokens will follow, enabling seamless copying without compromising output quality and without requiring additional GPU memory. To evaluate the effectiveness of our approach, we conducted experiments using seven LLMs and five datasets: MT-Bench, CNN/DM, GSM8K, HumanEval, and our newly created dataset, MT-Redundant. MT-Redundant, introduced in this paper, transforms the second turn of MT-Bench into a request for variations of the first turn’s answer, simulating real-world scenarios where users request modifications to prior responses. Our results demonstrate significant speed-ups: up to 2.35x on CNN/DM, 3.08x on the second turn of select MT-Redundant categories, and 2.66x on the third turn of GSM8K’s self-correction tasks. Importantly, we show that CopySpec integrates seamlessly with speculative decoding, yielding an average 49% additional speed-up over speculative decoding for the second turn of MT-Redundant across all eight categories. While LLMs, even with speculative decoding, suffer from slower inference as context size grows, CopySpec leverages larger contexts to accelerate inference, making it a faster complementary solution. Our code and dataset are publicly available at https://github.com/RazvanDu/CopySpec.

Anthology ID:: 2025.emnlp-main.1337
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26312–26343
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1337/
DOI:
Bibkey:
Cite (ACL):: Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, and Mihai Surdeanu. 2025. CopySpec: Accelerating LLMs with Speculative Copy-and-Paste. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 26312–26343, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: CopySpec: Accelerating LLMs with Speculative Copy-and-Paste (Dumitru et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1337.pdf
Checklist:: 2025.emnlp-main.1337.checklist.pdf

PDF Cite Search Checklist Fix data