Vojtěch John


2024

pdf
Unveiling Semantic Information in Sentence Embeddings
Leixin Zhang | David Burian | Vojtěch John | Ondřej Bojar
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024

This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult.