Diet-KIT: Post-Training Quantization for Speech LLMs

Danni Liu; Sai Koneru; Jan Niehues

doi:10.18653/v1/2026.iwslt-1.21

Diet-KIT: Post-Training Quantization for Speech LLMs

Abstract

We present Diet-KIT, a system for the IWSLT speech translation compression task under a strict 4 GB on-disk storage constraint, starting from the 16 GB Qwen2-Audio-7B base model. Compression is achieved with a sequential pipeline based on Half-Quadratic Quantization (HQQ). Based on systematic ablations, we find that 4-bit quantization preserves translation quality well, whereas 3-bit quantization induces a sharp performance cliff, precluding aggressive compression across the whole model. We further show that the embedding table tolerates 2-bit quantization with negligible loss, while the LM head requires higher precision. To satisfy the storage constraint, we propose a sensitivity-guided layer selection method that identifies MLP sublayers tolerant to 3-bit compression via a per-layer sensitivity analysis, which consistently outperforms manual and random layer selection. Finally, AWQ calibration is applied as a data-driven refinement stage. The final system achieves 3.98 GB on disk with COMET scores of 74.4 on en→de and 77.1 on en→zh, compared to 75.6 and 79.5 for the uncompressed fine-tuned model.

Anthology ID:: 2026.iwslt-1.21
Volume:: Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)
Month:: July
Year:: 2026
Address:: San Diego, USA (in-person and online)
Editors:: Elizabeth Salesky, Antonios Anastasopoulos, Matteo Negri, Marcello Federico
Venues:: IWSLT | WS
SIG:: SIGSLT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 189–196
Language:
URL:: https://preview.aclanthology.org/corrections-2026-06/2026.iwslt-1.21/
DOI:: 10.18653/v1/2026.iwslt-1.21
Bibkey:
Cite (ACL):: Danni Liu, Sai Koneru, and Jan Niehues. 2026. Diet-KIT: Post-Training Quantization for Speech LLMs. In Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 189–196, San Diego, USA (in-person and online). Association for Computational Linguistics.
Cite (Informal):: Diet-KIT: Post-Training Quantization for Speech LLMs (Liu et al., IWSLT 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2026-06/2026.iwslt-1.21.pdf

PDF Cite Search Fix data