Neo Eyal

2025

pdf bib abs
Layer Duplication in LLMs
Neo Eyal | Nachum Dershowitz | Kfir Bar
Findings of the Association for Computational Linguistics: EMNLP 2025

We investigate the effect of duplicating multihead self-attention layers in large language models (LLMs) across a range of language tasks, with and without fine-tuning. The results demonstrate that duplicating the initial layers once or twice often yields a significant performance boost. Attention analysis uncovered the underlying mechanisms driving the improvement when performing layer duplication. This method enhances LLM capabilities with or without additional training or labeled data.

Co-authors

Kfir Bar 1
Nachum Dershowitz 1

Venues

findings1

Fix author