Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats

Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Yu Zi Wei, Xin Wang, Mingxuan Yuan, Xianzhi Yu, Zhenhua Dong


Abstract
As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency. In this work, we evaluate HiFloat (HiF8 and HiF4), a family of formats tailored for Ascend NPUs. Through rigorous comparison across weight-activation and KV-cache tasks, we provide three key insights: (1) INT8 suits narrow-range data, while floating-point formats excel with high-variance data; (2) in 4-bit regimes, HiF4’s hierarchical scaling prevents the accuracy collapse seen in integer formats; and (3) HiFloat is fully compatible with state-of-the-art post-training quantization frameworks. Overall, HiFloat provides a solution for high-efficiency LLM inference on NPUs.
Anthology ID:
2026.acl-industry.57
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
844–864
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.57/
DOI:
Bibkey:
Cite (ACL):
Pengxiang Zhao, Hui-Ling Zhen, Xing Li, Han Bao, Weizhe Lin, Zhiyuan Yang, Yu Zi Wei, Xin Wang, Mingxuan Yuan, Xianzhi Yu, and Zhenhua Dong. 2026. Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 844–864, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats (Zhao et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.57.pdf