Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality

Sondre Wold, Étienne Simon, Erik Velldal, Lilja Øvrelid


Abstract
The principle of compositionality, which concerns the construction of meaning from constituent parts, is a longstanding topic in various disciplines, most commonly associated with formal semantics. In NLP, recent studies have focused on the compositional properties of text embedding models, particularly regarding their sensitivity to idiomatic expression, as idioms have traditionally been seen as non-compositional. In this paper, we argue that it is unclear how previous work relates to formal definitions of the principle. To address this limitation, we take a theoretically motivated approach based on definitions in formal semantics. We present 𝜀-compositionality, a continuous relaxation of compositionality derived from these definitions. We measure 𝜀-compositionality on a dataset containing both idiomatic and non-idiomatic sentences, providing a theoretically motivated assessment of sensitivity to idiomaticity. Our findings indicate that most text embedding models differentiate between idiomatic and non-idiomatic phrases, although to varying degrees.
Anthology ID:
2026.eacl-long.99
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2239–2252
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.99/
DOI:
Bibkey:
Cite (ACL):
Sondre Wold, Étienne Simon, Erik Velldal, and Lilja Øvrelid. 2026. Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2239–2252, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality (Wold et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.99.pdf