Abstract
This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult.- Anthology ID:
- 2024.dmr-1.5
- Volume:
- Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Claire Bonial, Julia Bonn, Jena D. Hwang
- Venues:
- DMR | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 39–47
- Language:
- URL:
- https://aclanthology.org/2024.dmr-1.5
- DOI:
- Cite (ACL):
- Leixin Zhang, David Burian, Vojtěch John, and Ondřej Bojar. 2024. Unveiling Semantic Information in Sentence Embeddings. In Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024, pages 39–47, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Unveiling Semantic Information in Sentence Embeddings (Zhang et al., DMR-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.dmr-1.5.pdf