Distillation Traps and Guards: A Calibration Knob for LLM Distillability

Weixiao Zhan; Yongcheng Jing; Leszek Rutkowski; Dacheng Tao

Distillation Traps and Guards: A Calibration Knob for LLM Distillability

Weixiao Zhan, Yongcheng Jing, Leszek Rutkowski, Dacheng Tao

Abstract

Knowledge distillation (KD) transfers capabilities from large language models (LLMs) to smaller students, yet it can fail unpredictably and also underpins model leakage risks. Our analysis revealed several distillation traps: tail noise, off-policy instability, and, most fundamentally, the teacher–student gap, that distort training signals. These traps manifest as overconfident hallucinations, self-correction collapse, and local decoding degradation, causing distillation to fail. Motivated by these findings, we propose a post-hoc calibration method that, to the best of our knowledge, for the first time enables control over a teacher’s distillability via reinforcement fine-tuning (RFT). Our objective combines task utility, KL anchor, and across-tokenizer calibration reward. This makes distillability a practical safety lever for foundation models, connecting robust teacher–student transfer with deployment-aware model protection. Experiments across math, knowledge QA, and instruction-following tasks show that students distilled from distillable calibrated teachers outperform SFT and KD baselines, while undistillable calibrated teachers retain their task performance but cause distilled students to collapse, offering a practical knob for both better KD and model IP protection.

Anthology ID:: 2026.acl-long.908
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19814–19833
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.908/
DOI:
Bibkey:
Cite (ACL):: Weixiao Zhan, Yongcheng Jing, Leszek Rutkowski, and Dacheng Tao. 2026. Distillation Traps and Guards: A Calibration Knob for LLM Distillability. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19814–19833, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Distillation Traps and Guards: A Calibration Knob for LLM Distillability (Zhan et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.908.pdf
Checklist:: 2026.acl-long.908.checklist.pdf

PDF Cite Search Checklist Fix data