Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders

Keita Fukushima, Tomoyuki Kajiwara, Takashi Ninomiya


Abstract
We propose an unsupervised method to disentangle sentence embeddings from multilingual sentence encoders into language-specific and language-agnostic representations. Such language-agnostic representations distilled by our method can estimate cross-lingual semantic sentence similarity by cosine similarity. Previous studies have trained individual extractors to distill each language-specific and -agnostic representation. This approach suffers from missing information resulting in the original sentence embedding not being fully reconstructed from both language-specific and -agnostic representations; this leads to performance degradation in estimating cross-lingual sentence similarity. We only train the extractor for language-agnostic representations and treat language-specific representations as differences from the original sentence embedding; in this way, there is no missing information. Experimental results for both tasks, quality estimation of machine translation and cross-lingual sentence similarity estimation, show that our proposed method outperforms existing unsupervised methods.
Anthology ID:
2025.mrl-main.18
Volume:
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:
November
Year:
2025
Address:
Suzhuo, China
Editors:
David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:
MRL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
265–270
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.mrl-main.18/
DOI:
10.18653/v1/2025.mrl-main.18
Bibkey:
Cite (ACL):
Keita Fukushima, Tomoyuki Kajiwara, and Takashi Ninomiya. 2025. Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 265–270, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):
Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders (Fukushima et al., MRL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.mrl-main.18.pdf