HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

Vladimir Malinovskii; Andrei Panferov; Ivan Ilin; Han Guo; Peter Richtárik; Dan Alistarh

HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem

Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, Dan Alistarh

Abstract

Quantizing large language models has become a standard way to reduce their memory and computational costs. Typically, existing methods focus on breaking down the problem into individual layer-wise sub-problems, and minimizing per-layer error, measured via various metrics. Yet, this approach currently lacks theoretical justification and the metrics employed may be sub-optimal. In this paper, we present a “linearity theorem” establishing a direct relationship between the layer-wise reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, which outperforms all prior data-free approaches such as the extremely popular NF4 quantized format, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels which match a given compression constraint, obtained by reduction to dynamic programming. On the practical side, we demonstrate improved accuracy-compression trade-offs on Llama-family models, advancing both data-free and non-uniform quantization for large language models.

Anthology ID:: 2025.naacl-long.543
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10857–10886
Language:
URL:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.543/
DOI:
Bibkey:
Cite (ACL):: Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, and Dan Alistarh. 2025. HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 10857–10886, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (Malinovskii et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-long.543.pdf

PDF Cite Search Fix data