Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models

Ming Wang; Miao Zhang; Xuebo Liu; Liqiang Nie

Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models

Ming Wang, Miao Zhang, Xuebo Liu, Liqiang Nie

Abstract

Activation sparsity provides a dynamic, input-dependent alternative to weight pruning for accelerating inference in large language models (LLMs), effectively reducing unnecessary computations and memory accesses during the forward pass. Despite its promise, existing activation sparsification methods suffer from two major limitations: (1) solely relying on activation magnitude for sparsification, ignoring the coupling influence with the corresponding weights, (2) applying uniform sparsity rates across all blocks without considering block-wise sparsity sensitivity. To address these issues, this paper proposes a novel training-free weight-aware activation sparsity framework, called **WAS**. Firstly, with analyzing the coupling relationship between weight and activation, we introduce a weight-aware scoring method to measure the activation importance in sparsification. Then, a novel constrained Bayesian optimization algorithm is further devised to set a suitable sparsity ratio for all blocks based on the sparsity sensitivity. Finally, we implement a custom GPU sparsity kernel to support the resulting sparsity patterns for wall-clock decoding speed-ups. Our **WAS** achieves competitive performance at 60% model-level sparsity and significantly outperforms prior methods at higher sparsity levels, achieving up to 1.68× inference speed-up—at no retraining or weight update. Codes are available at https://github.com/HITSZ-Miao-Group/WAS.

Anthology ID:: 2025.emnlp-main.57
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1086–1098
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.57/
DOI:
Bibkey:
Cite (ACL):: Ming Wang, Miao Zhang, Xuebo Liu, and Liqiang Nie. 2025. Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 1086–1098, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models (Wang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.57.pdf
Checklist:: 2025.emnlp-main.57.checklist.pdf

PDF Cite Search Checklist Fix data