ERATE: Efficient Retrieval Augmented Text Embeddings

Vatsal Raina, Nora Kassner, Kashyap Popat, Patrick Lewis, Nicola Cancedda, Louis Martin


Abstract
Embedding representations of text are useful for downstream natural language processing tasks. Several universal sentence representation methods have been proposed with a particular focus on self-supervised pre-training approaches to leverage the vast quantities of unlabelled data. However, there are two challenges for generating rich embedding representations for a new document. 1) The latest rich embedding generators are based on very large costly transformer-based architectures. 2) The rich embedding representation of a new document is limited to only the information provided without access to any explicit contextual and temporal information that could potentially further enrich the representation. We propose efficient retrieval-augmented text embeddings (ERATE) that tackles the first issue and offers a method to tackle the second issue. To the best of our knowledge, we are the first to incorporate retrieval to general purpose embeddings as a new paradigm, which we apply to the semantic similarity tasks of SentEval. Despite not reaching state-of-the-art performance, ERATE offers key insights that encourages future work into investigating the potential of retrieval-based embeddings.
Anthology ID:
2023.insights-1.2
Volume:
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–18
Language:
URL:
https://aclanthology.org/2023.insights-1.2
DOI:
10.18653/v1/2023.insights-1.2
Bibkey:
Cite (ACL):
Vatsal Raina, Nora Kassner, Kashyap Popat, Patrick Lewis, Nicola Cancedda, and Louis Martin. 2023. ERATE: Efficient Retrieval Augmented Text Embeddings. In Proceedings of the Fourth Workshop on Insights from Negative Results in NLP, pages 11–18, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
ERATE: Efficient Retrieval Augmented Text Embeddings (Raina et al., insights-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.insights-1.2.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.insights-1.2.mp4