Keita Fukushima

2025

pdf bib abs
Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders
Keita Fukushima | Tomoyuki Kajiwara | Takashi Ninomiya
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)

We propose an unsupervised method to disentangle sentence embeddings from multilingual sentence encoders into language-specific and language-agnostic representations. Such language-agnostic representations distilled by our method can estimate cross-lingual semantic sentence similarity by cosine similarity. Previous studies have trained individual extractors to distill each language-specific and -agnostic representation. This approach suffers from missing information resulting in the original sentence embedding not being fully reconstructed from both language-specific and -agnostic representations; this leads to performance degradation in estimating cross-lingual sentence similarity. We only train the extractor for language-agnostic representations and treat language-specific representations as differences from the original sentence embedding; in this way, there is no missing information. Experimental results for both tasks, quality estimation of machine translation and cross-lingual sentence similarity estimation, show that our proposed method outperforms existing unsupervised methods.

Co-authors

Venues

mrl1
ws1

Fix author