TAGQuant: Token-Aware Clustering for Group-Wise Quantization
Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Zhewei Yao, Yuxiong He
Abstract
Grouping, e.g., grouping channels, which is widely used in current integer-based quantization, has become essential for the emerging MXFP4 format. Ideally, each group should contain channels with similar quantization scales. To guide such groups, existing work clusters the channels using scalar proxy, ignoring the token dimension, which we find suboptimal. In this paper, we propose TAGQuant, a simple yet powerful enhancement for such “group-wise” quantization. By strategically shuffling channels to group those with similar token-wise activation distributions, TAGQuant ensures better clustering of large- and small-range values. This shuffle operation is hardware-efficient, and seamlessly integrated into the quantization process with only 0.01x latency overhead. TAGQuant reduces relative GSM8K error in both INT4 and MXFP4 formats, by up to 86% in Llama-3.1-8B-Instruct compared to baselines, validating the effectiveness of our channel shuffling approach for group-wise quantization. Code is publicly available.- Anthology ID:
- 2026.eacl-industry.18
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 253–262
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.18/
- DOI:
- Cite (ACL):
- Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Zhewei Yao, and Yuxiong He. 2026. TAGQuant: Token-Aware Clustering for Group-Wise Quantization. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 253–262, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- TAGQuant: Token-Aware Clustering for Group-Wise Quantization (Lee et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-industry.18.pdf