Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation

Khouloud Saadi; Jelena Mitrović; Michael Granitzer

doi:10.18653/v1/2023.findings-emnlp.882

Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation

Khouloud Saadi, Jelena Mitrović, Michael Granitzer

Abstract

Knowledge distillation is known as an effective technique for compressing over-parameterized language models. In this work, we propose to break down the global feature distillation task into N local sub-tasks. In this new framework, we consider each neuron in the last hidden layer of the teacher network as a specialized sub-teacher. We also consider each neuron in the last hidden layer of the student network as a focused sub-student. We make each focused sub-student learn from one corresponding specialized sub-teacher and ignore the others. This will facilitate the task for the sub-student and keep it focused. Our proposed method is novel and can be combined with other distillation techniques. Empirical results show that our proposed approach outperforms the state-of-the-art methods by maintaining higher performance on most benchmark datasets. Furthermore, we propose a randomized variant of our approach, called Masked One-to-One Mapping. Rather than learning all the N sub-tasks simultaneously, we focus on learning a subset of these sub-tasks at each optimization step. This variant enables the student to digest the received flow of knowledge more effectively and yields superior results.

Anthology ID:: 2023.findings-emnlp.882
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13235–13245
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.882
DOI:: 10.18653/v1/2023.findings-emnlp.882
Bibkey:
Cite (ACL):: Khouloud Saadi, Jelena Mitrović, and Michael Granitzer. 2023. Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13235–13245, Singapore. Association for Computational Linguistics.
Cite (Informal):: Learn From One Specialized Sub-Teacher: One-to-One Mapping for Feature-Based Knowledge Distillation (Saadi et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp22-frontmatter/2023.findings-emnlp.882.pdf

PDF Search