CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression

Dayin Gou; Sanghyun Byun; Nilesh Malpeddi; Gabrielle De Micheli; Prathamesh Vaste; Jacob Song; Woo Seong Chung

doi:10.18653/v1/2025.findings-emnlp.1009

CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression

Dayin Gou, Sanghyun Byun, Nilesh Malpeddi, Gabrielle De Micheli, Prathamesh Vaste, Jacob Song, Woo Seong Chung

Abstract

Large Language Models (LLMs) typically rely on a large number of parameters for token embedding, leading to substantial storage requirements and memory footprints. In particular, LLMs deployed on edge devices are memory-bound, and reducing the memory footprint by compressing the embedding layer not only frees up the memory bandwidth but also speeds up inference. To address this, we introduce CARVQ, a post-training novel Corrective Adaptor combined with group Residual Vector Quantization. CARVQ relies on the composition of both linear and non-linear maps and mimics the original model embedding to compress to approximately 1.6 bits without requiring specialized hardware to support lower-bit storage. We test our method on pre-trained LLMs such as LLaMA-3.2-1B, LLaMA-3.2-3B, LLaMA-3.2-3B-Instruct, LLaMA-3.1-8B, Qwen2.5-7B, Qwen2.5-Math-7B and Phi-4, evaluating on common generative, discriminative, math and reasoning tasks. We show that in most cases, CARVQ can achieve lower average bitwidth-per-parameter while maintaining reasonable perplexity and accuracy compared to scalar quantization. Our contributions include a novel compression technique that is compatible with state-of-the-art transformer quantization methods and can be seamlessly integrated into any hardware supporting 4-bit memory to reduce the model’s memory footprint in memory-constrained devices. This work demonstrates a crucial step toward the efficient deployment of LLMs on edge devices.

Anthology ID:: 2025.findings-emnlp.1009
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18594–18604
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1009/
DOI:: 10.18653/v1/2025.findings-emnlp.1009
Bibkey:
Cite (ACL):: Dayin Gou, Sanghyun Byun, Nilesh Malpeddi, Gabrielle De Micheli, Prathamesh Vaste, Jacob Song, and Woo Seong Chung. 2025. CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18594–18604, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression (Gou et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1009.pdf
Checklist:: 2025.findings-emnlp.1009.checklist.pdf

PDF Cite Search Checklist Fix data