SCALE: Upscaled Continual Learning of Large Language Models

Jin-woo Lee, Junhwa Choi, Bongkyu Hwang, Jinho Choo, Bogun Kim, Jeongseon Yi, Joonseok Lee, DongYoung Jung, Jaeseon Park, Kyoungwon Park, Suk-hoon Jung


Abstract
We revisit continual pre-training for large language models and argue that progress now depends less on scaling parameters than on scaling the right structure. We introduce SCALE, a width upscaling architecture that inserts lightweight expansions into linear modules while freezing all pre-trained parameters, preserving residual and attention topologies and increasing capacity without perturbing the base model’s original functionality. SCALE follows two principles: Persistent Preservation, which maintains the base model’s behavior via preservation-oriented initialization and freezing of the pre-trained weights, and Collaborative Adaptation, which trains only selected expansion components to acquire new knowledge with minimal interference. We instantiate these ideas as SCALE-Preserve (preservation-first), SCALE-Adapt (adaptation-first), and SCALE-Route, an optional routing extension that performs token-level routing between preservation and adaptation heads. On a controlled synthetic biography benchmark, SCALE reduces the severe forgetting seen in depth expansion while still learning new knowledge. In continual pre-training on a Korean corpus, SCALE variants forget less on English evaluations and achieve competitive gains on Korean benchmarks, yielding the best overall stability-plasticity trade-off. We further analyze when preservation holds provably and why combining preservation and adaptation stabilizes optimization relative to standard continual learning.
Anthology ID:
2026.findings-acl.2037
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41006–41020
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2037/
DOI:
Bibkey:
Cite (ACL):
Jin-woo Lee, Junhwa Choi, Bongkyu Hwang, Jinho Choo, Bogun Kim, Jeongseon Yi, Joonseok Lee, DongYoung Jung, Jaeseon Park, Kyoungwon Park, and Suk-hoon Jung. 2026. SCALE: Upscaled Continual Learning of Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41006–41020, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SCALE: Upscaled Continual Learning of Large Language Models (Lee et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2037.pdf
Checklist:
 2026.findings-acl.2037.checklist.pdf