Bernt Ivar Utstøl Nødland
2023
Training and Evaluating Norwegian Sentence Embedding Models
Bernt Ivar Utstøl Nødland
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
We train and evaluate Norwegian sentence embedding models using the contrastive learning methodology SimCSE. We start from pre-trained Norwegian encoder models and train both unsupervised and supervised models. The models are evaluated on a machine-translated version of semantic textual similarity datasets, as well as binary classification tasks. We show that we can train good Norwegian sentence embedding models, that clearly outperform the pre-trained encoder models, as well as the multilingual mBERT, on the task of sentence similarity.