Semantic Geometry of Sentence Embeddings

Matthieu Tehenan


Abstract
Sentence embeddings are central to modern natural language processing, powering tasks such as clustering, semantic search, and retrieval-augmented generation. Yet, they remain largely opaque: their internal features are not directly interpretable, and users lack fine-grained control for downstream tasks. To address this issue, we introduce a formal framework to characterize the organization of features in sentence embeddings through information-theoretic means. Building on this foundation, we develop a method to identify interpretable feature directions and show how they can be composed to capture richer semantic structures. Experiments on both synthetic and real-world datasets confirm the presence of this semantic geometry and highlight the utility of our approach for enhancing interpretability and fine-grained control in sentence embeddings.
Anthology ID:
2025.findings-emnlp.641
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11993–12004
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.641/
DOI:
10.18653/v1/2025.findings-emnlp.641
Bibkey:
Cite (ACL):
Matthieu Tehenan. 2025. Semantic Geometry of Sentence Embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11993–12004, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Semantic Geometry of Sentence Embeddings (Tehenan, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.641.pdf
Checklist:
 2025.findings-emnlp.641.checklist.pdf