QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Jiajun Zhou; Yifan Yang; Kai Zhen; Ziyue Liu; Yequan Zhao; Ershad Banijamali; Athanasios Mouchtaris; Ngai Wong; Zheng Zhang

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models

Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang

Abstract

Large Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various downstream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which is error-prone in the low-precision settings. To overcome these limitations, we propose the Quantized Zeroth-Order (QuZO) framework, specifically designed for fine-tuning LLMs through low-precision (e.g., 4- or 8-bit) forward passes. Our method avoids the low-precision straight-through estimator, which requires backward computation, and instead utilizes optimized stochastic rounding to mitigate increased bias. QuZO simplifies the training process, while achieving results comparable to first-order methods in FP8 and superior accuracy in INT8 and INT4 training. Experiments demonstrate that QuZO achieves competitive performance on classification, multi-choice, and generation tasks under low-bit training, including zero-shot reasoning tasks. Notably, QuZO incurs minimal overhead and reduces memory consumption by 2.94 ×–5.47 × compared to quantized first-order methods during LLaMA-7B fine-tuning.

Anthology ID:: 2025.emnlp-main.271
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5341–5359
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.271/
DOI:
Bibkey:
Cite (ACL):: Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, and Zheng Zhang. 2025. QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 5341–5359, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models (Zhou et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.271.pdf
Checklist:: 2025.emnlp-main.271.checklist.pdf

PDF Cite Search Checklist Fix data