MEMORY-VQ: Compression for Tractable Internet-Scale Memory
Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontanon, William Cohen, Sumit Sanghai, Joshua Ainslie
Abstract
Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN (de Jong et al., 2023a) pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.- Anthology ID:
- 2024.naacl-short.64
- Volume:
- Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 737–744
- Language:
- URL:
- https://aclanthology.org/2024.naacl-short.64
- DOI:
- Cite (ACL):
- Yury Zemlyanskiy, Michiel de Jong, Luke Vilnis, Santiago Ontanon, William Cohen, Sumit Sanghai, and Joshua Ainslie. 2024. MEMORY-VQ: Compression for Tractable Internet-Scale Memory. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 737–744, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- MEMORY-VQ: Compression for Tractable Internet-Scale Memory (Zemlyanskiy et al., NAACL 2024)
- PDF:
- https://preview.aclanthology.org/ingestion-checklist/2024.naacl-short.64.pdf