Roos M. Bakker


2026

Ensuring trustworthy and traceable outputs from Large Language Models (LLMs) is crucial in high-stakes domains such as law. Retrieval-Augmented Generation (RAG) offers a way to enhance LLMs with domain-specific or updated information and provide attribution to the source, and recent work has focused on knowledge-based RAG (K-RAG) for improved factual grounding. However, proper evaluation of such systems requires high-quality datasets. To address this need, we introduce QuALA-NL: a dataset that provides attributions to legal formalizations, enabling experiments with K-RAG in the legal domain. The dataset contains 101 QA pairs on three Dutch laws, with attributions to the law text and a formalization of the interpretation of the legal text. To demonstrate the capabilities of the dataset, we perform experiments using four configurations: LLM-only, RAG using legal texts, K-RAG using a formalization of the legal texts, and RAG combining both legal texts and the formalizations. The results show that K-RAG has the highest retrieval scores, but that this method is outperformed by text-based RAG on generation. A qualitative analysis shows that the use of the knowledge graph for the generation of answers can be improved. QuALA-NL can be used in future work to experiment with knowledge-based Retrieval Augmented Generation methods.