Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation
Risa Kondo, Hiroki Yamauchi, Tomoyuki Kajiwara, Marie Katsurai, Takashi Ninomiya
Abstract
We propose a domain adaptation method for multilingual sentence encoders. In domains requiring a high level of expertise, such as medical and academic, domain-specific pre-trained models have been released in each language. However, there is no its multilingual version, which prevents application to cross-lingual information retrieval. Obviously, multilingual pre-training with developing in-domain corpora in each language is costly. Therefore, we efficiently develop domain-specific cross-lingual sentence encoders from existing multilingual sentence encoders and domain-specific monolingual sentence encoders in each language. Experimental results on translation ranking in three language pairs with different domains reveal the effectiveness of the proposed method compared to baselines without domain adaptation and existing domain adaptation methods.- Anthology ID:
- 2025.ranlp-1.67
- Volume:
- Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
- Month:
- September
- Year:
- 2025
- Address:
- Varna, Bulgaria
- Editors:
- Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 572–577
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.67/
- DOI:
- Cite (ACL):
- Risa Kondo, Hiroki Yamauchi, Tomoyuki Kajiwara, Marie Katsurai, and Takashi Ninomiya. 2025. Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 572–577, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Domain Knowledge Distillation for Multilingual Sentence Encoders in Cross-lingual Sentence Similarity Estimation (Kondo et al., RANLP 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2026-01/2025.ranlp-1.67.pdf