Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models

Juan Pablo Munoz; Jinjie Yuan; Nilesh Jain

Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models

Juan Pablo Munoz, Jinjie Yuan, Nilesh Jain

Abstract

Large pre-trained models have achieved outstanding results in sequence modeling. The Transformer block and its attention mechanism have been the main drivers of the success of these models. Recently, alternative architectures, such as Selective Structured State Space Models (SSMs), have been proposed to address the inefficiencies of Transformers. This paper explores the compression of SSM-based models, particularly Mamba and its hybrids. We study the sensitivity of these models to the removal of selected components at different granularities to reduce the model size and computational overhead, thus improving their efficiency while maintaining accuracy. The proposed solutions, collectively referred to as Mamba-Shedder, achieve a speedup of up to 1.4x during inference, demonstrating that model efficiency can be improved by eliminating several redundancies with minimal impact on the overall model performance. The code is available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

Anthology ID:: 2025.naacl-long.195
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3851–3863
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.195/
DOI:
Bibkey:
Cite (ACL):: Juan Pablo Munoz, Jinjie Yuan, and Nilesh Jain. 2025. Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3851–3863, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models (Munoz et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.195.pdf

PDF Cite Search Fix data