Sumin Song


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free
Euntae Choi | Sumin Song | Woosang Lim | Sungjoo Yoo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Large Language Models (LLMs) face deployment challenges due to high computational costs, and while Post-Training Quantization (PTQ) offers a solution, existing rotation-based methods struggle at very low bit-widths like 2-bit. We introduce a novel, training-free approach to construct an improved rotation matrix, addressing the limitations of current methods. The key contributions include leveraging the Walsh-Hadamard transform with sequency ordering, which clusters similar frequency components to reduce quantization error compared to standard Hadamard matrices, significantly improving performance. Furthermore, we propose a Grouped Sequency-arranged Rotation (GSR) using block-diagonal matrices with smaller Walsh blocks, effectively isolating outlier impacts and achieving performance comparable to optimization-based methods without requiring any training. Our method demonstrates robust performance on reasoning tasks and Perplexity (PPL) score on WikiText-2. Our method also enhances results even when applied over existing learned rotation techniques.

pdf bib
Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer
Euntae Choi | Sumin Song | Woosang Lim | Sungjoo Yoo
Findings of the Association for Computational Linguistics: EMNLP 2025

We propose Rotate, Clip, and Partition (RCP), a Quantization-Aware Training (QAT) approach that first realizes extreme compression of LLMs with W2A4KV4 (2-bit weight, 4-bit activation, and 4-bit KV-cache) configuration. RCP integrates recent rotation techniques with a novel non-uniform weight quantizer design by theoretically and empirically analyzing the impact of rotation on the non-uniformity of weight distribution. Our weight quantizer, Learnable Direct Partitioning (LDP), introduces learnable parameters to directly learn non-uniform intervals jointly with LLM weights. We also present a GPU kernel supporting GEMV on non-uniform W2A4 as proof of concept. Experiments show that RCP can compress LLaMA-2-7B to W2A4KV4 with a loss of only 2.84 WikiText2 PPL and 5.29 times reduced memory footprint. Furthermore, RCP can quantize challenging mobile-targeted LLaMA-3.2 models and domain-specific WizardCoder-7B and MetaMath-7B with no critical problems such as convergence failure and repetition. Code is available at https://github.com/songsm921/RCP.