Abstract
Neural machine translation has achieved levels of fluency and adequacy that would have been surprising a short time ago. Output quality is extremely relevant for industry purposes, however it is equally important to produce results in the shortest time possible, mainly for latency-sensitive applications and to control cloud hosting costs. In this paper we show the effectiveness of translating with 8-bit quantization for models that have been trained using 32-bit floating point values. Results show that 8-bit translation makes a non-negligible impact in terms of speed with no degradation in accuracy and adequacy.- Anthology ID:
- N18-3014
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans - Louisiana
- Editors:
- Srinivas Bangalore, Jennifer Chu-Carroll, Yunyao Li
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 114–120
- Language:
- URL:
- https://aclanthology.org/N18-3014
- DOI:
- 10.18653/v1/N18-3014
- Cite (ACL):
- Jerry Quinn and Miguel Ballesteros. 2018. Pieces of Eight: 8-bit Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 114–120, New Orleans - Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Pieces of Eight: 8-bit Neural Machine Translation (Quinn & Ballesteros, NAACL 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/N18-3014.pdf