AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, Eunhyeok Park
Abstract
To enable broader deployment of Large Language Models (LLMs), it is essential to identify the best-performing model under strict memory constraints. We present AMQ, Automated Mixed-Precision Weight-Only Quantization, a framework that assigns layer-wise quantization bit-widths to optimally balance model quality and memory usage. However, the combinatorial search space, with over 10100 possible configurations, makes conventional black-box optimization infeasible. AMQ overcomes this challenge through four key innovations: (1) **search space pruning** using prior knowledge to exclude unpromising configurations, (2) **quantization proxy** to bypass costly format conversions during search, (3) **quality predictor** to minimize evaluation overhead, and (4) **iterative search-and-update** strategy for fast and stable convergence. By integrating these components, AMQ efficiently explores the quality–efficiency landscape, reaching the Pareto frontier and yielding LLMs that are both compact and high-performing.- Anthology ID:
- 2025.emnlp-main.1799
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35520–35538
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799/
- DOI:
- Cite (ACL):
- Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, and Eunhyeok Park. 2025. AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35520–35538, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models (Lee et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1799.pdf