ACBQ: Adaptive Cross-Block Quantization of Large Language Models
Hailing Wang, Jianglin Lu, Yitian Zhang, Huimin Zeng, Yun Fu
Abstract
Post-training quantization (PTQ) has emerged as a promising approach for reducing the memory footprint and computational cost of large language models (LLMs), enabling efficient deployment without full model retraining. However, existing PTQ methods struggle to simultaneously support weight–activation joint quantization and extreme low-bit weight quantization. This limitation primarily arises from the depth of LLMs and their strong cross-layer dependencies, which cause quantization errors to propagate and accumulate across layers, ultimately leading to significant performance degradation. In this paper, we present ACBQ, a simple yet effective framework that simultaneously addresses weight–activation joint quantization and extreme weight quantization. We first propose a granular quantization strategy that treats self-attention and FFN as separate quantization units with module-specific optimization objectives. To mitigate the propagation and accumulation of quantization errors across layers, we introduce an adaptive cross-block quantization strategy that explicitly accounts for cross-layer dependencies by encouraging consistency across blocks. Extensive experiments across diverse LLMs, including OPT and the LLaMA family, demonstrate that ACBQ achieves superior performance under both W4A4 and highly aggressive W2 settings, while incurring negligible additional computational overhead.- Anthology ID:
- 2026.acl-long.1971
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 42578–42592
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1971/
- DOI:
- Cite (ACL):
- Hailing Wang, Jianglin Lu, Yitian Zhang, Huimin Zeng, and Yun Fu. 2026. ACBQ: Adaptive Cross-Block Quantization of Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 42578–42592, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- ACBQ: Adaptive Cross-Block Quantization of Large Language Models (Wang et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1971.pdf