A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting

Renate Burema, Anne Schuth, Christopher Spelt, Dong Nguyen


Abstract
In this paper, we present a Dutch benchmark to assess whether large language models (LLMs) exhibit social biases in hiring decisions, focusing on gender and country of origin. We experiment with two approaches: explicit descriptions of the applicants’ demographics and using first names as proxies. We evaluate both monolingual and multilingual LLMs and find that all tested models, gpt-4o-mini, claude-3.5-haiku, Geitje-7B-Ultra and EuroLLM-9B-Instruct, exhibit some degree of social bias in their decisions. Furthermore, all models tested are sensitive to the manner in which the prompts are written. We make our benchmark publicly available under an EUPL-1.2 license. The benchmark is available at https://github.com/MinBZK/llm-benchmark/tree/main/benchmarks/social-bias.
Anthology ID:
2026.lrec-main.312
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
3932–3943
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.312/
DOI:
Bibkey:
Cite (ACL):
Renate Burema, Anne Schuth, Christopher Spelt, and Dong Nguyen. 2026. A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting. International Conference on Language Resources and Evaluation, main:3932–3943.
Cite (Informal):
A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting (Burema et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.312.pdf