Acceleration of Backpropagation in Linear Layers of Transformer Models Based on Gradient Structure
Dmitrii Topchii, Alexander Panchenko, Viktoriia A. Chekalina
Abstract
Fine-tuning Transformer models is often dominated by the backward computation in linear layers. In many NLP tasks, input sequences are short and padded to a fixed context length, inducing structured sparsity in the output gradients. We propose Sparsity-Exploiting Backward Pass (SEBP), a heuristic method that reduces backward computation by exploiting this sparsity with negligible memory overhead. We show that, for short input sequences, the output gradients of BERT-based and LLaMA models exhibit pronounced sparsity, allowing for optimisation in the backward computation. We optimized the autograd function in the linear layers, significantly reducing the number of FLOPs during the backward.Our method achieves a backward pass speedup of approximately 2.15x for BERT-base on GLUE tasks and 1.99x for a 3B LLaMA model on reasoning benchmarks, while maintaining memory usage nearly identical to the regular PyTorch fine-tuning. Crucially, this speedup comes at no cost to performance. We show that our method matches standard convergence rates, offering a memory-efficient way to accelerate LLM fine-tuning.- Anthology ID:
- 2026.eacl-srw.31
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 426–436
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.31/
- DOI:
- Cite (ACL):
- Dmitrii Topchii, Alexander Panchenko, and Viktoriia A. Chekalina. 2026. Acceleration of Backpropagation in Linear Layers of Transformer Models Based on Gradient Structure. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 426–436, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Acceleration of Backpropagation in Linear Layers of Transformer Models Based on Gradient Structure (Topchii et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.31.pdf