Keita Fukushima

2026

Disentangling Meaning and Language Components in Diverse Multilingual Sentence Embeddings
Kanade Nonomura | Keita Fukushima | Risa Kondo | Tomoyuki Kajiwara
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

We disentangle multilingual sentence embeddings into language-dependent and language-agnostic components, leveraging the latter to improve cross-lingual similarity estimation.Previous studies focused on encoder-based approaches that use only the input sentence; in contrast, this study examines the effectiveness of disentanglement methods across a broader range of sentence embeddings, including decoder-based approaches and those that utilize prompts.Experimental results demonstrate that embedding disentanglement is effective for a wide variety of sentence embeddings.

2025

pdf bib abs

Reversible Disentanglement of Meaning and Language Representations from Multilingual Sentence Encoders
Keita Fukushima | Tomoyuki Kajiwara | Takashi Ninomiya
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)

We propose an unsupervised method to disentangle sentence embeddings from multilingual sentence encoders into language-specific and language-agnostic representations. Such language-agnostic representations distilled by our method can estimate cross-lingual semantic sentence similarity by cosine similarity. Previous studies have trained individual extractors to distill each language-specific and -agnostic representation. This approach suffers from missing information resulting in the original sentence embedding not being fully reconstructed from both language-specific and -agnostic representations; this leads to performance degradation in estimating cross-lingual sentence similarity. We only train the extractor for language-agnostic representations and treat language-specific representations as differences from the original sentence embedding; in this way, there is no missing information. Experimental results for both tasks, quality estimation of machine translation and cross-lingual sentence similarity estimation, show that our proposed method outperforms existing unsupervised methods.

Co-authors

Venues

Fix author