Multi-SimLex for Dutch: Benchmarking Embedding- and Prompt-Based Model Performance on Semantic Similarity

Lizzy Brans; Jelke Bloem

Multi-SimLex for Dutch: Benchmarking Embedding- and Prompt-Based Model Performance on Semantic Similarity

Abstract

We introduce Dutch Multi-SimLex, a 1,888–pair extension of the Multi-SimLex benchmark for evaluating lexical semantic similarity in Dutch. The dataset was rated by 100 native speakers on a 0–6 scale and shows high reliability (overall ICC(2,k)=0.82) as well as strong alignment with English (ρ=0.73). Using this resource, we evaluate eighteen models across four architectural families: static embeddings, encoder-only transformers, encoder–decoders, and decoder-only LLMs. We evaluate models using two complementary approaches: embedding-based cosine similarity and prompted similarity judgments in Dutch. In embedding-based evaluation, FastText (ρ=0.485) and the monolingual Dutch encoder BERTje (ρ=0.468) achieve the strongest alignment with human ratings, while multilingual encoders such as mBERT (ρ=0.208) and XLM-R (ρ=0.186) perform weaker. Prompt-based evaluation yields substantially higher correlations, with GPT-4 (ρ=0.761) performing best, followed by DeepSeek-V3 (ρ=0.753) and Gemini 1.5 Pro (ρ=0.722). Together, the results show that model performance depends strongly on how meaning is tested. Dutch Multi-SimLex provides a reliable foundation for evaluating meaning across architectures and advancing Dutch semantic evaluation.

Anthology ID:: 2026.lrec-main.380
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 4846–4860
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.380/
DOI:
Bibkey:
Cite (ACL):: Lizzy Brans and Jelke Bloem. 2026. Multi-SimLex for Dutch: Benchmarking Embedding- and Prompt-Based Model Performance on Semantic Similarity. International Conference on Language Resources and Evaluation, main:4846–4860.
Cite (Informal):: Multi-SimLex for Dutch: Benchmarking Embedding- and Prompt-Based Model Performance on Semantic Similarity (Brans & Bloem, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.380.pdf

PDF Cite Search Fix data