Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

Verna Dankers; Vikas Raunak

Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

Abstract

In this work, we explore how instance-level memorization in the teacher Neural Machine Translation (NMT) model gets inherited by the student model in sequence-level knowledge distillation (SeqKD). We find that despite not directly seeing the original training data, students memorize more than baseline models (models of the same size, trained on the original data)—3.4% for exact matches and 57% for extractive memorization—and show increased hallucination rates. Further, under this SeqKD setting, we also characterize how students behave on specific training data subgroups, such as subgroups with low quality or specific counterfactual memorization (CM) scores, and find that students exhibit greater denoising on low-quality subgroups. Finally, we propose a modification to SeqKD named Adaptive-SeqKD, which intervenes in SeqKD to reduce memorization and hallucinations. Overall, we recommend caution when applying SeqKD: students inherit both their teachers’ superior performance and their fault modes, thereby requiring active monitoring.

Anthology ID:: 2025.acl-short.61
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 760–774
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-short.61/
DOI:
Bibkey:
Cite (ACL):: Verna Dankers and Vikas Raunak. 2025. Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 760–774, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation (Dankers & Raunak, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-short.61.pdf

PDF Cite Search Fix data