AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

Sangjun Lee; Seung-taek Woo; Jun-gyu Jin; Changhun Lee; Eunhyeok Park

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, Eunhyeok Park

Abstract

To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-wise quantization bit-widths to optimally balance model quality and memory usage. However, the combinatorial search space, with over 10¹⁰⁰ possible configurations, makes conventional black-box optimization infeasible. AMQ overcomes this challenge through four key innovations: (1) **search space pruning** using prior knowledge to exclude unpromising configurations, (2) **quantization proxy** to bypass costly format conversions during search, (3) **quality predictor** to minimize evaluation overhead, and (4) **iterative search-and-update** strategy for fast and stable convergence. By integrating these components, AMQ efficiently explores the quality–efficiency landscape, reaching the Pareto frontier and yielding LLMs that are both compact and high-performing.

Anthology ID:: 2025.emnlp-main.1799
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35520–35538
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799/
DOI:
Bibkey:
Cite (ACL):: Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, and Eunhyeok Park. 2025. AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35520–35538, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models (Lee et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799.pdf
Checklist:: 2025.emnlp-main.1799.checklist.pdf

PDF Cite Search Checklist Fix data