Cross-Tokenizer LLM Distillation through a Byte-Level Interface
Avyav Kumar Singh, Yen-Chen Wu, Alexandru Cioba, Alberto Bernacchia, Davide Buffelli
Abstract
Cross-tokenizer distillation (CTD), the transfer of knowledge from a teacher to a student language model when the two use different tokenizers, remains a largely unsolved problem. Existing approaches rely on heuristic strategies to align mismatched vocabularies, introducing considerable complexity. In this paper, we propose a simple but effective baseline called Byte-Level Distillation (BLD) which enables CTD by operating at a common interface across tokenizers: the byte level. In more detail, we convert the teacher’s output distribution to byte-level probabilities, attach a lightweight byte-level decoder head to the student, and distill through this shared byte-level interface. Despite its simplicity, BLD performs competitively with–and on several benchmarks surpasses–significantly more sophisticated CTD methods, across a range of distillation tasks with models from 1B to 8B parameters. Our results suggest that the byte level is a natural common ground for cross-tokenizer knowledge transfer, while also highlighting that consistent improvements across all tasks and benchmarks remain elusive, underscoring that CTD is still an open problem.- Anthology ID:
- 2026.customnlp4u-1.9
- Volume:
- Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Sheshera Mysore, Sachin Kumar, Vidhisha Balachandran, Shirley Anugrah Hayati, Faeze Brahman, Hanane Nour Moussa, Alireza Salemi
- Venues:
- CustomNLP4U | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 84–96
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.9/
- DOI:
- Cite (ACL):
- Avyav Kumar Singh, Yen-Chen Wu, Alexandru Cioba, Alberto Bernacchia, and Davide Buffelli. 2026. Cross-Tokenizer LLM Distillation through a Byte-Level Interface. In Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 84–96, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Tokenizer LLM Distillation through a Byte-Level Interface (Singh et al., CustomNLP4U 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.9.pdf