Online Distilling from Checkpoints for Neural Machine Translation

Hao-Ran Wei, Shujian Huang, Ran Wang, Xin-yu Dai, Jiajun Chen


Abstract
Current predominant neural machine translation (NMT) models often have a deep structure with large amounts of parameters, making these models hard to train and easily suffering from over-fitting. A common practice is to utilize a validation set to evaluate the training process and select the best checkpoint. Average and ensemble techniques on checkpoints can lead to further performance improvement. However, as these methods do not affect the training process, the system performance is restricted to the checkpoints generated in original training procedure. In contrast, we propose an online knowledge distillation method. Our method on-the-fly generates a teacher model from checkpoints, guiding the training process to obtain better performance. Experiments on several datasets and language pairs show steady improvement over a strong self-attention-based baseline system. We also provide analysis on data-limited setting against over-fitting. Furthermore, our method leads to an improvement in a machine reading experiment as well.
Anthology ID:
N19-1192
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1932–1941
Language:
URL:
https://aclanthology.org/N19-1192
DOI:
10.18653/v1/N19-1192
Bibkey:
Cite (ACL):
Hao-Ran Wei, Shujian Huang, Ran Wang, Xin-yu Dai, and Jiajun Chen. 2019. Online Distilling from Checkpoints for Neural Machine Translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1932–1941, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Online Distilling from Checkpoints for Neural Machine Translation (Wei et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/N19-1192.pdf