Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation

Risa Kondo, Hiroki Yamauchi, Tomoyuki Kajiwara, Marie Katsurai, Takashi Ninomiya


Abstract
We propose a domain adaptation method for multilingual sentence encoders. In domains requiring a high level of expertise, such as medical and academic, domain-specific pre-trained models have been released in each language. However, there is no its multilingual version, which prevents application to cross-lingual information retrieval. Obviously, multilingual pre-training with developing in-domain corpora in each language is costly. Therefore, we efficiently develop domain-specific cross-lingual sentence encoders from existing multilingual sentence encoders and domain-specific monolingual sentence encoders in each language. Experimental results on translation ranking in three language pairs with different domains reveal the effectiveness of the proposed method compared to baselines without domain adaptation and existing domain adaptation methods.
Anthology ID:
2025.ranlp-1.67
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
572–577
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.67/
DOI:
Bibkey:
Cite (ACL):
Risa Kondo, Hiroki Yamauchi, Tomoyuki Kajiwara, Marie Katsurai, and Takashi Ninomiya. 2025. Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 572–577, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation (Kondo et al., RANLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.67.pdf