Run LoRA Run: Faster and Lighter LoRA Implementations

Daria Cherniuk, Aleksandr Mikhalev, Ivan Oseledets


Abstract
LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This technique is used for fine-tuning and even training large transformer models from scratch. This paper presents the RunLoRA framework for efficient implementations of LoRA, which significantly improves the speed of neural network training and fine-tuning with low-rank adapters. The proposed implementation optimizes the computation of LoRA operations based on the shape of the corresponding linear layer weights, the input dimensions, and the LoRA rank by selecting the best forward and backward computation graphs based on FLOPs and time estimations. This results in faster training without sacrificing accuracy. The experimental results show a speedup ranging from 10% to 28% on various transformer models.
Anthology ID:
2025.acl-industry.15
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
200–207
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.acl-industry.15/
DOI:
Bibkey:
Cite (ACL):
Daria Cherniuk, Aleksandr Mikhalev, and Ivan Oseledets. 2025. Run LoRA Run: Faster and Lighter LoRA Implementations. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 200–207, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Run LoRA Run: Faster and Lighter LoRA Implementations (Cherniuk et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.acl-industry.15.pdf