Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training

Jinyang Du; Ruihao Gong; Linghan Ai; Zining Wang; Yunke Peng; Yao Wang; Lei Yan; Wxuefei; Yaoyuan Wang; Jinyang Guo; Dahua Lin; Xianglong Liu

Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training

Jinyang Du, Ruihao Gong, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Wxuefei, Yaoyuan Wang, Jinyang Guo, Dahua Lin, Xianglong Liu

Abstract

Training large language models (LLMs) at 4-bit precision offers substantial efficiency gains but remains challenging due to the limited dynamic range and coarse numerical resolution. Existing 4-bit training pipelines typically rely on max-scaling, which is ill-suited for heavy-tailed LLM tensor distributions and leads to severe under-utilization of the FP4 quantization grid in the low-magnitude region. This effect causes pronounced representation collapse and large rounding errors for the values that dominate LLM computation. In this work, we derive the theoretically optimal scaling for FP4 under heavy-tailed inputs, revealing why max-scaling is intrinsically suboptimal. Guided by this analysis, we propose Half-S, a simple and efficient scaling strategy that uses half-scaling as a hardware-friendly default and falls back to an MSE-based clipping threshold when needed, yielding a close approximation to the theoretical optimum under real LLM statistics. Extensive experiments on large-scale pretraining and downstream fine-tuning show that Half-S consistently narrows the gap to BF16 in both convergence and final model quality, while preserving the efficiency benefits of 4-bit computation. Under native FP4 support, Half-S is estimated to provide up to 1.8× end-to-end training speedup. These results indicate that Half-S provides a simple and effective correction to max-scaling, substantially improving the stability and accuracy of 4-bit LLM training.

Anthology ID:: 2026.findings-acl.241
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4890–4903
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.241/
DOI:
Bibkey:
Cite (ACL):: Jinyang Du, Ruihao Gong, Linghan Ai, Zining Wang, Yunke Peng, Yao Wang, Lei Yan, Wxuefei, Yaoyuan Wang, Jinyang Guo, Dahua Lin, and Xianglong Liu. 2026. Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4890–4903, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Half-S: Halving the Scale for Near-Lossless 4-Bit LLM Training (Du et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.241.pdf
Checklist:: 2026.findings-acl.241.checklist.pdf

PDF Cite Search Checklist Fix data