Abstract
We propose a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding. In training an encoder-decoder model, typically, the output of the last layer of the N-layer encoder is fed to the M-layer decoder, and the output of the last decoder layer is used to compute loss. Instead, our method computes a single loss consisting of NxM losses, where each loss is computed from the output of one of the M decoder layers connected to one of the N encoder layers. Such a model subsumes NxM models with different number of encoder and decoder layers, and can be used for decoding with fewer than the maximum number of encoder and decoder layers. Given our flexible tied model, we also address to a-priori selection of the number of encoder and decoder layers for faster decoding, and explore recurrent stacking of layers and knowledge distillation for model compression. We present a cost-benefit analysis of applying the proposed approaches for neural machine translation and show that they reduce decoding costs while preserving translation quality.- Anthology ID:
- 2020.ngt-1.3
- Volume:
- Proceedings of the Fourth Workshop on Neural Generation and Translation
- Month:
- July
- Year:
- 2020
- Address:
- Online
- Editors:
- Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Kenneth Heafield, Marcin Junczys-Dowmunt, Ioannis Konstas, Xian Li, Graham Neubig, Yusuke Oda
- Venue:
- NGT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24–34
- Language:
- URL:
- https://aclanthology.org/2020.ngt-1.3
- DOI:
- 10.18653/v1/2020.ngt-1.3
- Cite (ACL):
- Raj Dabre, Raphael Rubino, and Atsushi Fujita. 2020. Balancing Cost and Benefit with Tied-Multi Transformers. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 24–34, Online. Association for Computational Linguistics.
- Cite (Informal):
- Balancing Cost and Benefit with Tied-Multi Transformers (Dabre et al., NGT 2020)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/2020.ngt-1.3.pdf