Extreme Model Compression for On-device Natural Language Understanding
Kanthashree Mysore Sathyendra, Samridhi Choudhary, Leah Nicolich-Henkin
Abstract
In this paper, we propose and experiment with techniques for extreme compression of neural natural language understanding (NLU) models, making them suitable for execution on resource-constrained devices. We propose a task-aware, end-to-end compression approach that performs word-embedding compression jointly with NLU task learning. We show our results on a large-scale, commercial NLU system trained on a varied set of intents with huge vocabulary sizes. Our approach outperforms a range of baselines and achieves a compression rate of 97.4% with less than 3.7% degradation in predictive performance. Our analysis indicates that the signal from the downstream task is important for effective compression with minimal degradation in performance.- Anthology ID:
- 2020.coling-industry.15
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics: Industry Track
- Month:
- December
- Year:
- 2020
- Address:
- Online
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 160–171
- Language:
- URL:
- https://aclanthology.org/2020.coling-industry.15
- DOI:
- 10.18653/v1/2020.coling-industry.15
- Cite (ACL):
- Kanthashree Mysore Sathyendra, Samridhi Choudhary, and Leah Nicolich-Henkin. 2020. Extreme Model Compression for On-device Natural Language Understanding. In Proceedings of the 28th International Conference on Computational Linguistics: Industry Track, pages 160–171, Online. International Committee on Computational Linguistics.
- Cite (Informal):
- Extreme Model Compression for On-device Natural Language Understanding (Mysore Sathyendra et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2020.coling-industry.15.pdf