Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training
Jinyang Du, Ruihao Gong, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Wxuefei, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu
Abstract
Training large language models (LLMs) at 4-bit precision offers substantial efficiency gains but remains challenging due to the limited dynamic range and coarse numerical resolution. Existing 4-bit training pipelines typically rely on max-scaling, which is ill-suited for heavy-tailed LLM tensor distributions and leads to severe under-utilization of the FP4 quantization grid in the low-magnitude region. This effect causes pronounced representation collapse and large rounding errors for the values that dominate LLM computation. In this work, we derive the theoretically optimal scaling for FP4 under heavy-tailed inputs, revealing why max-scaling is intrinsically suboptimal. Guided by this analysis, we propose Half-S, a simple and efficient scaling strategy that uses half-scaling as a hardware-friendly default and falls back to an MSE-based clipping threshold when needed, yielding a close approximation to the theoretical optimum under real LLM statistics. Extensive experiments on large-scale pretraining and downstream fine-tuning show that Half-S consistently narrows the gap to BF16 in both convergence and final model quality, while preserving the efficiency benefits of 4-bit computation. Under native FP4 support, Half-S is estimated to provide up to 1.8× end-to-end training speedup. These results indicate that Half-S provides a simple and effective correction to max-scaling, substantially improving the stability and accuracy of 4-bit LLM training.- Anthology ID:
- 2026.findings-acl.241
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4890–4903
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.241/
- DOI:
- Cite (ACL):
- Jinyang Du, Ruihao Gong, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Wxuefei, Yaoyuan Wang, Jinyang Guo, Dahua Lin, and Xianglong Liu. 2026. Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4890–4903, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training (Du et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.241.pdf