Improving Efficiency in Large Language Models via Extendable Block Floating Point Representation

Dongyang Li, Zeyang Li, Bosheng Liu, Jigang Wu


Abstract
Large language models (LLMs) have revolutionized natural language processing (NLP) tasks, yet their increasing size poses substantial challenges in terms of computational and memory resources. Block floating-point (BFP) arithmetic offers an effective solution by leveraging the strengths of both floating-point and fixed-point representations, leading to reductions in both storage and computational overhead. However, current low-bit BFP quantization approaches often struggle to handle extreme outliers, leading to significant accuracy degradation. To overcome this limitation, we introduce Extendable Exponent Sharing (EES), a novel BFP representation that extends the exponent bit width to capture a wider dynamic range. EES achieves this by embedding extendable exponent bits into the least significant mantissa bits, thereby increasing the shared exponent’s bit width without incurring additional storage costs. To optimize the trade-off between accuracy and energy efficiency, EES employs a design space exploration strategy to optimize the configuration of extendable exponent bit widths. Experimental results show that EES outperforms representative baselines in both accuracy and computational efficiency.
Anthology ID:
2025.findings-acl.768
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14861–14873
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.768/
DOI:
Bibkey:
Cite (ACL):
Dongyang Li, Zeyang Li, Bosheng Liu, and Jigang Wu. 2025. Improving Efficiency in Large Language Models via Extendable Block Floating Point Representation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14861–14873, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Improving Efficiency in Large Language Models via Extendable Block Floating Point Representation (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.768.pdf