Break Through the Compression Bottleneck: From Theory to Practice

Xiusheng Huang, Lu Wang, Yequan Wang, Jun Zhao, Kang Liu


Abstract
As the parameter size of language models continues to grow, effective model compression is required to reduce their computational and memory overhead. Existing compression methods suffer from bottleneck issues: when the compression ratio is increased, performance degrades significantly. Low-rank decomposition and quantization are two prominent compression methods that have been proven to significantly reduce the computational and memory requirements of Large Language Models (LLMs) while maintaining model accuracy. Evidently, combining these two methods will break through the existing compression bottleneck. However, how these two methods interact when combined remains a critical question for developers, as many assume they are orthogonal, meaning their combination would not introduce additional errors beyond those independently introduced by each method. This paper provides the first mathematical proof that low-rank decomposition and quantization are non-orthogonal. We validate these findings through a series of experiments on large language models. Our results demonstrate that these methods are non-orthogonal, and their combination leads to significant performance degradation. Importantly, we propose a novel approach Diagonal Adhesive Method (DAM), which can effectively combine the two methods and mitigate the performance loss. Our research provides deep insights into model compression and lays a solid theoretical and experimental foundation for future related studies.
Anthology ID:
2026.findings-acl.1557
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31125–31142
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1557/
DOI:
Bibkey:
Cite (ACL):
Xiusheng Huang, Lu Wang, Yequan Wang, Jun Zhao, and Kang Liu. 2026. Break Through the Compression Bottleneck: From Theory to Practice. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31125–31142, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Break Through the Compression Bottleneck: From Theory to Practice (Huang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1557.pdf
Checklist:
 2026.findings-acl.1557.checklist.pdf