DiffSkip: Differential Layer Skipping in Large Language Models

Xuan Luo, Weizhi Wang, Xifeng Yan


Abstract
Existing Large Language Models (LLMs) enforce uniform computation across all tokens. We analyze the correlation between the input-output difference of self-attention block and Feed-Forward Network (FFN) within the same transformer layer, and find that these two differential vectors are highly correlated. Thus, we propose to dynamically skip the FFN blocks based on the self-attention difference and introduce Diffential Layer Skipping (DiffSkip) to show that LLMs are inherently dynamic-depth models, capable of adjusting computational depth when generating different tokens. DiffSkip employs a lightweight router module to dynamically skip a set of FFN blocks in LLMs and only requires efficient fine-tuning while keeping the whole LLM frozen. Experimental results demonstrate that DiffSkip effectively enables dynamic FFN skipping in decoder-only language models, even in continuous token generation tasks where many layer-skipping methods struggle.
Anthology ID:
2025.findings-acl.377
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7221–7231
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.377/
DOI:
10.18653/v1/2025.findings-acl.377
Bibkey:
Cite (ACL):
Xuan Luo, Weizhi Wang, and Xifeng Yan. 2025. DiffSkip: Differential Layer Skipping in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 7221–7231, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DiffSkip: Differential Layer Skipping in Large Language Models (Luo et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.377.pdf