Aligning Sizes of Intermediate Layers by LoRA Adapter for Knowledge Distillation

Takeshi Suzuki, Hiroaki Yamada, Takenobu Tokunaga


Abstract
Intermediate Layer Distillation (ILD) is a variant of Knowledge Distillation (KD), a method for compressing neural networks.ILD requires mapping to align the intermediate layer sizes of the teacher and student models to compute the loss function in training, while this mapping is not used during inference.This inconsistency may reduce the effectiveness of learning in intermediate layers.In this study, we propose LoRAILD, which uses LoRA adapters to eliminate the inconsistency.However, our experimental results show that LoRAILD does not outperform existing methods.Furthermore, contrary to previous studies, we observe that conventional ILD does not outperform vanilla KD.Our analysis of the distilled models’ intermediate layers suggests that ILD does not improve language models’ performance.
Anthology ID:
2025.insights-1.10
Volume:
The Sixth Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Aleksandr Drozd, João Sedoc, Shabnam Tafreshi, Arjun Akula, Raphael Shu
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–105
Language:
URL:
https://preview.aclanthology.org/moar-dois/2025.insights-1.10/
DOI:
10.18653/v1/2025.insights-1.10
Bibkey:
Cite (ACL):
Takeshi Suzuki, Hiroaki Yamada, and Takenobu Tokunaga. 2025. Aligning Sizes of Intermediate Layers by LoRA Adapter for Knowledge Distillation. In The Sixth Workshop on Insights from Negative Results in NLP, pages 100–105, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Aligning Sizes of Intermediate Layers by LoRA Adapter for Knowledge Distillation (Suzuki et al., insights 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/moar-dois/2025.insights-1.10.pdf