Ruijie Zhang
2025
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
Ziyue Liu
|
Ruijie Zhang
|
Zhengyang Wang
|
Mingsong Yan
|
Zi Yang
|
Paul D. Hovland
|
Bogdan Nicolae
|
Franck Cappello
|
Sui Tang
|
Zheng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
The full-size MLPs and the projection layers in attention introduce tremendous model sizes of large language models (LLMs), consuming extensive computational resources in pre-training. We empirically observe that the activations of pre-trained LLMs exhibit low-rank property. Motivated by such observations, we propose **CoLA** and its memory-efficient implementation, **CoLA-M**, to replace these full-size layers with compute-efficient **auto-encoders** that naturally enforce low-rank activations throughout training. This fundamental architectural change eliminates the activation redundancy and significantly boosts model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by 2\pmb{\times} and improves training throughput by 1.86\pmb{\times} while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also 2\pmb{\times} smaller, enabling faster inference with lower memory cost on resource-constrained platforms.
Search
Fix author
Co-authors
- Franck Cappello 1
- Paul D. Hovland 1
- Ziyue Liu 1
- Bogdan Nicolae 1
- Sui Tang 1
- show all...