Anne Schuth
2026
A Dutch Benchmark to Assess Social Bias in LLMs within a Hiring Decision Setting
Renate Burema | Anne Schuth | Christopher Spelt | Dong Nguyen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Renate Burema | Anne Schuth | Christopher Spelt | Dong Nguyen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
In this paper, we present a Dutch benchmark to assess whether large language models (LLMs) exhibit social biases in hiring decisions, focusing on gender and country of origin. We experiment with two approaches: explicit descriptions of the applicants’ demographics and using first names as proxies. We evaluate both monolingual and multilingual LLMs and find that all tested models, gpt-4o-mini, claude-3.5-haiku, Geitje-7B-Ultra and EuroLLM-9B-Instruct, exhibit some degree of social bias in their decisions. Furthermore, all models tested are sensitive to the manner in which the prompts are written. We make our benchmark publicly available under an EUPL-1.2 license. The benchmark is available at https://github.com/MinBZK/llm-benchmark/tree/main/benchmarks/social-bias.
2019
Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM
Chia-Lun Yeh | Babak Loni | Anne Schuth
Proceedings of the 13th International Workshop on Semantic Evaluation
Chia-Lun Yeh | Babak Loni | Anne Schuth
Proceedings of the 13th International Workshop on Semantic Evaluation
In this paper, we describe our attempt to learn bias from news articles. From our experiments, it seems that although there is a correlation between publisher bias and article bias, it is challenging to learn bias directly from the publisher labels. On the other hand, using few manually-labeled samples can increase the accuracy metric from around 60% to near 80%. Our system is computationally inexpensive and uses several standard document representations in NLP to train an SVM or LR classifier. The system ranked 4th in the SemEval-2019 task. The code is released for reproducibility.
2010
DutchParl. The Parliamentary Documents in Dutch
Maarten Marx | Anne Schuth
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Maarten Marx | Anne Schuth
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
A corpus called DutchParl is created which aims to contain all digitally available parliamentary documents written in the Dutch language. The first version of DutchParl contains documents from the parliaments of The Netherlands, Flanders and Belgium. The corpus is divided along three dimensions: per parliament, scanned or digital documents, written recordings of spoken text and others. The digital collection contains more than 800 million tokens, the scanned collection more than 1 billion. All documents are available as UTF-8 encoded XML files with extensive metadata in Dublin Core standard. The text itself is divided into pages which are divided into paragraphs. Every document, page and paragraph has a unique URN which resolves to a web page. Every page element in the XML files is connected to a facsimile image of that page in PDF or JPEG format. We created a viewer in which both versions can be inspected simultaneously. The corpus is available for download in several formats. The corpus can be used for corpus-linguistic and political science research, and is suitable for performing scalability tests for XML information systems.