Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Ignacio Sastre, Aiala Rosá


Abstract
In this work, we observe an interesting phenomenon: it is possible to generate reversible sentence embeddings that allow an LLM to reconstruct the original text exactly, without modifying the model’s weights. This is achieved by introducing a special memory token, whose embedding is optimized through training on a fixed sequence. When prompted with this embedding, the model reconstructs the fixed sequence exactly. We evaluate this phenomenon across English and Spanish datasets, sequences of up to approximately 240 tokens, and model scales ranging from 100M to 8B parameters. Notably, Llama 3.1 8B successfully reconstructs all tested sequences. Our findings highlight an interesting capability of LLMs and suggest potential applications in memory-based retrieval, compression, and controlled text generation.
Anthology ID:
2025.l2m2-1.14
Volume:
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Robin Jia, Eric Wallace, Yangsibo Huang, Tiago Pimentel, Pratyush Maini, Verna Dankers, Johnny Wei, Pietro Lesci
Venues:
L2M2 | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
183–189
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.14/
DOI:
10.18653/v1/2025.l2m2-1.14
Bibkey:
Cite (ACL):
Ignacio Sastre and Aiala Rosá. 2025. Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings. In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 183–189, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings (Sastre & Rosá, L2M2 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.l2m2-1.14.pdf