Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

Yuval Weiss; David Demitri Africa; Paula Buttery; Richard Diehl Martinez

Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

Yuval Weiss, David Demitri Africa, Paula Buttery, Richard Diehl Martinez

Abstract

Parameter-efficient methods like LoRA have revolutionised large language model (LLM) fine-tuning. ReLoRA extends this idea to pretraining by repeatedly merging and reinitialising low-rank adapters, increasing cumulative rank while keeping updates cheap. This aligns well with observations that high-capacity models learn through locally low-rank trajectories that expand over time. By contrast, recent work suggests that small language models (SLMs) exhibit rank deficiencies and under-utilise their available dimensionality. This raises a natural question: can ReLoRA’s rank-expanding update rule steer SLMs toward healthier learning dynamics, mitigating rank bottlenecks in a capacity-constrained regime? We argue SLMs are an ideal testbed: they train quickly, enable controlled ablations, and make rank phenomena more measurable. We present the first systematic study of ReLoRA in SLMs (11M-66M parameters), evaluating both performance and learning dynamics. Across loss, Paloma perplexity, and BLiMP, we find that ReLoRA underperforms full-rank training, with gaps widening at larger scales. Analysis of proportional effective rank and condition numbers shows that ReLoRA amplifies existing rank deficiencies and induces ill-conditioned updates early in training. Our results suggest that while ReLoRA’s merge-and-restart strategy can expand ranks in larger models, it does not straightforwardly translate to capacity-limited SLMs, motivating adaptive-rank or hybrid-rank approaches for low-compute pretraining.

Anthology ID:: 2025.blackboxnlp-1.9
Volume:: Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Yonatan Belinkov, Aaron Mueller, Najoung Kim, Hosein Mohebbi, Hanjie Chen, Dana Arad, Gabriele Sarti
Venues:: BlackboxNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 163–175
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.blackboxnlp-1.9/
DOI:
Bibkey:
Cite (ACL):: Yuval Weiss, David Demitri Africa, Paula Buttery, and Richard Diehl Martinez. 2025. Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models. In Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 163–175, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models (Weiss et al., BlackboxNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.blackboxnlp-1.9.pdf

PDF Cite Search Fix data