Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models

Hoang-Chau Luong, Lingwei Chen


Abstract
Low-Rank Adaptation (LoRA) is widely used for parameter-efficient fine-tuning of large language models, but it is notably ineffective at removing backdoor behaviors from poisoned pretrained models when fine-tuning on clean dataset. Contrary to the common belief that this weakness is caused primarily by low rank, we show that LoRA’s vulnerability is fundamentally spectral. Our analysis identifies two key factors: LoRA updates (i) possess insufficient spectral strength, with singular values far below those of pretrained weights, and (ii) exhibit unfavorable spectral alignment, weakly matching clean-task directions while retaining overlap with trigger-sensitive subspaces. We further establish a critical scaling threshold beyond which LoRA can theoretically suppress trigger-induced activations, and we show empirically that standard LoRA rarely reaches this regime. We introduce Regularized Low-Rank Adaptation (RoRA), which improves forgetting by increasing spectral strength and correcting alignment through clean-strengthened regularization, trigger-insensitive constraints, and post-training spectral rescaling. Experiments across multiple NLP benchmarks and attack settings show that RoRA substantially reduces attack success rates while maintaining clean accuracy.
Anthology ID:
2026.findings-acl.1732
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34692–34705
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1732/
DOI:
Bibkey:
Cite (ACL):
Hoang-Chau Luong and Lingwei Chen. 2026. Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34692–34705, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models (Luong & Chen, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1732.pdf
Checklist:
 2026.findings-acl.1732.checklist.pdf