PE-QAT: Parameter-Efficient Quantization-Aware Training for Large Language Models

Shresth Mishra

PE-QAT: Parameter-Efficient Quantization-Aware Training for Large Language Models

Abstract

As large language models (LLMs) grow, their compute and memory demands become prohibitive for on-device deployment. Quantization is a crucial technique to shrink model footprint and accelerate inference, but aggressively low-bit weight-activation quantization schemes often sacrifice accuracy. Quantization Aware Training (QAT) is a commonly used paradigm to minimize quantization noise, but is extremely expensive to train and often unscalable to large models. We introduce PE-QAT, a parameter-efficient framework targeting per-channel 4-bit weight-activation quantization of LLMs, which aims to preserve model accuracy while significantly reducing resource requirements. The proposed method freezes the base model and trains lightweight LoRA adapters by fake quantizing the merged-weight model, enabling PE-QAT to scale efficiently unlike full QAT. We apply fake quantization with Straight-Through Estimators (STE) to the merged weights, allowing the adapters to explicitly compensate for quantization noise during training. One of the biggest challenges with quantizing activations alongside weights is addressing outliers that are orders of magnitude larger than other activations, which inflate quantization scales and suppress lower-magnitude values. To mitigate the impact of severe activation outliers, PE-QAT jointly learns per-channel smoothing factors and symmetric activation clipping thresholds. PE-QAT retains accuracy within 0.11 percentage points of the full-precision baseline on Llama-2-7B zero-shot tasks while training only 1.26% of total parameters.

Anthology ID:: 2026.acl-srw.63
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 701–711
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.63/
DOI:
Bibkey:
Cite (ACL):: Shresth Mishra. 2026. PE-QAT: Parameter-Efficient Quantization-Aware Training for Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 701–711, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: PE-QAT: Parameter-Efficient Quantization-Aware Training for Large Language Models (Mishra, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.63.pdf

PDF Cite Search Fix data