Seung-taek Woo


2025

pdf bib
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
Sangjun Lee | Seung-taek Woo | Jun-gyu Jin | Changhun Lee | Eunhyeok Park
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-wise quantization bit-widths to optimally balance model quality and memory usage. However, the combinatorial search space, with over 10100 possible configurations, makes conventional black-box optimization infeasible. AMQ overcomes this challenge through four key innovations: (1) **search space pruning** using prior knowledge to exclude unpromising configurations, (2) **quantization proxy** to bypass costly format conversions during search, (3) **quality predictor** to minimize evaluation overhead, and (4) **iterative search-and-update** strategy for fast and stable convergence. By integrating these components, AMQ efficiently explores the quality–efficiency landscape, reaching the Pareto frontier and yielding LLMs that are both compact and high-performing.