Vec2Sent: Probing Sentence Embeddings with Natural Language Generation

Martin Kerscher, Steffen Eger


Abstract
We introspect black-box sentence embeddings by conditionally generating from them with the objective to retrieve the underlying discrete sentence. We perceive of this as a new unsupervised probing task and show that it correlates well with downstream task performance. We also illustrate how the language generated from different encoders differs. We apply our approach to generate sentence analogies from sentence embeddings.
Anthology ID:
2020.coling-main.152
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1729–1736
Language:
URL:
https://aclanthology.org/2020.coling-main.152
DOI:
10.18653/v1/2020.coling-main.152
Bibkey:
Cite (ACL):
Martin Kerscher and Steffen Eger. 2020. Vec2Sent: Probing Sentence Embeddings with Natural Language Generation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1729–1736, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation (Kerscher & Eger, COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.152.pdf
Code
 maruker/vec2sent
Data
SentEval