Disentangling Continued Pre-Training: Attention-Driven Routing and Semantic Hub Preservation in Language Adaptation

Khanh-Tung Tran, Vinh-Khanh Tran, Barry O'Sullivan, Hoang D. Nguyen


Abstract
Continued Pre-Training (CPT) enables Large Language Models (LLMs) to acquire second-language capabilities, yet the underlying mechanisms remain poorly understood. In this work, we investigate how CPT adapts model representations across diverse language families and scripts, model sizes, and architectures. We find that second-language abilities emerge through a selective adaptation mechanism: task-solving capabilities are preserved in “semantic hub”, while interface layers retarget to shifted token distributions. Layer-swapping experiments demonstrate that semantic understanding can be surgically transferred between base and CPT models with minimal loss (e.g., swapping 50% of model parameters reduces performance by only 0.3%). Furthermore, we establish that attention components route language adaptation: larger parameter changes than feedforward networks, correlate more strongly with language-specific neurons, and their surgical replacement substantially degrades performance. Overall, our work provides a mechanistic understanding of CPT, guiding future work on efficient strategies for language adaptation.
Anthology ID:
2026.findings-acl.1218
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24335–24357
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1218/
DOI:
Bibkey:
Cite (ACL):
Khanh-Tung Tran, Vinh-Khanh Tran, Barry O'Sullivan, and Hoang D. Nguyen. 2026. Disentangling Continued Pre-Training: Attention-Driven Routing and Semantic Hub Preservation in Language Adaptation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24335–24357, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Disentangling Continued Pre-Training: Attention-Driven Routing and Semantic Hub Preservation in Language Adaptation (Tran et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1218.pdf
Checklist:
 2026.findings-acl.1218.checklist.pdf