Disentangling Meaning and Language Components in Diverse Multilingual Sentence Embeddings

Kanade Nonomura, Keita Fukushima, Risa Kondo, Tomoyuki Kajiwara


Abstract
We disentangle multilingual sentence embeddings into language-dependent and language-agnostic components, leveraging the latter to improve cross-lingual similarity estimation.Previous studies focused on encoder-based approaches that use only the input sentence; in contrast, this study examines the effectiveness of disentanglement methods across a broader range of sentence embeddings, including decoder-based approaches and those that utilize prompts.Experimental results demonstrate that embedding disentanglement is effective for a wide variety of sentence embeddings.
Anthology ID:
2026.acl-srw.102
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1169–1176
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.102/
DOI:
Bibkey:
Cite (ACL):
Kanade Nonomura, Keita Fukushima, Risa Kondo, and Tomoyuki Kajiwara. 2026. Disentangling Meaning and Language Components in Diverse Multilingual Sentence Embeddings. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1169–1176, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Disentangling Meaning and Language Components in Diverse Multilingual Sentence Embeddings (Nonomura et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.102.pdf