AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models

Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, Eunhyeok Park


Abstract
To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-wise quantization bit-widths to optimally balance model quality and memory usage. However, the combinatorial search space, with over 10100 possible configurations, makes conventional black-box optimization infeasible. AMQ overcomes this challenge through four key innovations: (1) **search space pruning** using prior knowledge to exclude unpromising configurations, (2) **quantization proxy** to bypass costly format conversions during search, (3) **quality predictor** to minimize evaluation overhead, and (4) **iterative search-and-update** strategy for fast and stable convergence. By integrating these components, AMQ efficiently explores the quality–efficiency landscape, reaching the Pareto frontier and yielding LLMs that are both compact and high-performing.
Anthology ID:
2025.emnlp-main.1799
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35520–35538
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799/
DOI:
Bibkey:
Cite (ACL):
Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, and Eunhyeok Park. 2025. AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35520–35538, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models (Lee et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799.pdf
Checklist:
 2025.emnlp-main.1799.checklist.pdf