Abstract
We speed up Neural Machine Translation (NMT) decoding by shrinking run-time target vocabulary. We experiment with two shrinking approaches: Locality Sensitive Hashing (LSH) and word alignments. Using the latter method, we get a 2x overall speed-up over a highly-optimized GPU implementation, without hurting BLEU. On certain low-resource language pairs, the same methods improve BLEU by 0.5 points. We also report a negative result for LSH on GPUs, due to relatively large overhead, though it was successful on CPUs. Compared with Locality Sensitive Hashing (LSH), decoding with word alignments is GPU-friendly, orthogonal to existing speedup methods and more robust across language pairs.- Anthology ID:
- P17-2091
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Regina Barzilay, Min-Yen Kan
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 574–579
- Language:
- URL:
- https://aclanthology.org/P17-2091
- DOI:
- 10.18653/v1/P17-2091
- Cite (ACL):
- Xing Shi and Kevin Knight. 2017. Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 574–579, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary (Shi & Knight, ACL 2017)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/P17-2091.pdf