Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping
Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou
Abstract
Knowledge Distillation (KD) has emerged as a prominent technique for model compression. However, conventional KD approaches primarily focus on homogeneous architectures with identical tokenizers, constraining their applicability in cross-architecture scenarios. As for the cross-tokenizer KD, the differences in the tokenizers give rise to two fundamental challenges: (1) sequence misalignment caused by divergent tokenization strategies, and (2) mismatched vocabulary size and composition. While existing probability-matching methods attempt to address these issues, their efficacy remains limited due to suboptimal alignment in both the sequence and vocabulary aspects. To overcome these limitations, we propose Contextual Dynamic Mapping (CDM), a novel cross-tokenizer distillation framework that employs contextual information to enhance sequence alignment precision and dynamically improves vocabulary mapping. We evaluated the effectiveness of our approach across five advanced and widely-used model families (i.e,LLama3, Phi3, Gemma2, OPT and Qwen2), which were configured into three distinct teacher-student pairs. Our method shows significant advantages over existing cross-tokenizer distillation baselines across diverse benchmarks, including instruction-following, code generation and math. Notably, our analysis reveals that combining conventional same-tokenizer distillation and cross-tokenizer distillation through CDM yields further performance improvements.- Anthology ID:
- 2025.findings-acl.419
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8005–8018
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.419/
- DOI:
- 10.18653/v1/2025.findings-acl.419
- Cite (ACL):
- Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, and Jie Zhou. 2025. Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping. In Findings of the Association for Computational Linguistics: ACL 2025, pages 8005–8018, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping (Chen et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.419.pdf