Layer Duplication in LLMs

Neo Eyal, Nachum Dershowitz, Kfir Bar


Abstract
We investigate the effect of duplicating multihead self-attention layers in large language models (LLMs) across a range of language tasks, with and without fine-tuning. The results demonstrate that duplicating the initial layers once or twice often yields a significant performance boost. Attention analysis uncovered the underlying mechanisms driving the improvement when performing layer duplication. This method enhances LLM capabilities with or without additional training or labeled data.
Anthology ID:
2025.findings-emnlp.967
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17797–17807
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.967/
DOI:
10.18653/v1/2025.findings-emnlp.967
Bibkey:
Cite (ACL):
Neo Eyal, Nachum Dershowitz, and Kfir Bar. 2025. Layer Duplication in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 17797–17807, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Layer Duplication in LLMs (Eyal et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.967.pdf
Checklist:
 2025.findings-emnlp.967.checklist.pdf