Lei Wang
Other people with similar names: Lei Wang, Lei Wang, Lei Wang, Lei Wang, Lei Wang, Lei Wang
Unverified author pages with similar names: Lei Wang
2026
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
Hao Gu | Lujun Li | Hao Wang | Lei Wang | Zheyu Wang | Bei Liu | Jiacheng Liu | Qiyuan Zhu | Sirui Han | Yike Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Gu | Lujun Li | Hao Wang | Lei Wang | Zheyu Wang | Bei Liu | Jiacheng Liu | Qiyuan Zhu | Sirui Han | Yike Guo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Binary quantization represents the most extreme form of compression, reducing weights to ±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization achieves sub-1-bit compression via weight pruning, it faces critical challenger: performance degradation, mask-management overhead, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages binary pattern clustering and weight transformation to overcome these limitations. Our approach incorporates two key innovations: (1) a Binary Codebook that clusters recurring vectors into compact indices using custom distance metrics and sign-based updates; (2) a Learnable Transformation that reduces outliers and promotes shared sign patterns among binary weights. This eliminates sparse masks, enabling efficient inference on standard hardware. Extensive evaluations across LLaMA, Qwen, and FBI-LLM families demonstrate that BTC-LLM achieves state-of-the-art results in extreme compression (1.11–0.7 bits). Notably, BTC-LLM compressed to 0.8 bits on LLaMA-2-13B maintains high performance—with only a 3.1% accuracy drop in zero-shot benchmarks—while delivering a 1.6× speedup over FP16.