GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation

Beom Jin Kang, Hyun Kim


Abstract
Large-scale models have achieved state-of-the-art performance in automatic speech recognition (ASR), but their high memory and computation demands pose significant challenges for deployment. To address these challenges, weight-only quantization is widely adopted in large-scale models, where weights dominate memory usage, as it enables efficient compression with minimal accuracy degradation compared to activation quantization. Accordingly, most prior quantization studies for ASR models have focused on weights and employed quantization-aware training (QAT) to restore accuracy. However, QAT incurs substantial additional training costs, posing clear limitations for practical application to large-scale models. Moreover, despite the varying quantization sensitivity across layers, mixed-precision quantization (MPQ) remains underexplored in ASR. In this paper, we propose GenPTQ, a mixed-precision post-training quantization method that optimizes the trade-off among accuracy, model size, and optimization cost by leveraging gradient-based sensitivity measurement and transforming the search space into a continuous domain for efficient numerical optimization. Applied to Whisper and Conformer models across multiple speech datasets, GenPTQ achieves up to 89.1% model size reduction (2.5-bit average precision) with only a 0.8% increase in WER, and completes optimization in just 15 seconds. These results demonstrate its effectiveness for low-resource ASR deployment.
Anthology ID:
2025.findings-emnlp.566
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10704–10718
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.566/
DOI:
10.18653/v1/2025.findings-emnlp.566
Bibkey:
Cite (ACL):
Beom Jin Kang and Hyun Kim. 2025. GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10704–10718, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation (Kang & Kim, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.566.pdf
Checklist:
 2025.findings-emnlp.566.checklist.pdf