Achieving binary weight and activation for LLMs using Post-Training Quantization
Siqing Song, Chuang Wang, Rui-Qi Wang, Yi Yang, Xu-Yao Zhang
Abstract
Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4 bits (W4A4). In this paper, we propose a post-training quantization framework with W(1+1)A(1×4) configuration, where weights are quantized to 1 bit with an additional 1 bit for fine-grain grouping and activations are quantized to 1 bit with a 4-fold increase in the number of channels. For weight quantization, we propose utilizing Hessian-aware fine-grained grouping along with an EM-based quantization scheme. For activation quantization, we decompose INT4-quantized activations into a 4 × INT1 format equivalently and simultaneously smooth the scaling factors based on quantization errors, which further reduces the quantization errors in activations. Our method surpasses state-of-the-art (SOTA) LLM quantization baselines on W2A4 across multiple tasks, pushing the boundaries of existing LLM quantization methods toward fully binarized models. Code is available at https://github.com/JimmyCrave/LLM-PTQ-binarization.- Anthology ID:
- 2025.findings-acl.459
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8782–8795
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.findings-acl.459/
- DOI:
- Cite (ACL):
- Siqing Song, Chuang Wang, Rui-Qi Wang, Yi Yang, and Xu-Yao Zhang. 2025. Achieving binary weight and activation for LLMs using Post-Training Quantization. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8782–8795, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Achieving binary weight and activation for LLMs using Post-Training Quantization (Song et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.findings-acl.459.pdf