MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch

Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Cristina Arhiliuc, Marija Kliocaite, Walter Daelemans


Abstract
Recently, embedding resources, including models, benchmarks, and datasets, have been widely released to support a variety of languages. However, the Dutch language remains underrepresented, typically comprising only a small fraction of the published multilingual resources. To address this gap and encourage the further development of Dutch embeddings, we introduce new resources for their evaluation and generation. First, we introduce the Massive Text Embedding Benchmark for Dutch (MTEB-NL), which includes both existing Dutch datasets and newly created ones, covering a wide range of tasks. Second, we provide a training dataset compiled from available Dutch retrieval datasets, complemented with synthetic data generated by large language models to expand task coverage beyond retrieval. Finally, we release a series of E5-NL compact yet efficient embedding models that demonstrate strong performance across multiple tasks. We make our resources publicly available through the Hugging Face Hub and the MTEB package.
Anthology ID:
2026.findings-acl.1236
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24684–24709
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1236/
DOI:
Bibkey:
Cite (ACL):
Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Cristina Arhiliuc, Marija Kliocaite, and Walter Daelemans. 2026. MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24684–24709, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
MTEB-NL and E5-NL: Embedding Benchmark and Models for Dutch (Banar et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1236.pdf
Checklist:
 2026.findings-acl.1236.checklist.pdf