bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch

Nikolay Banar; Ehsan Lotfi; Jens Van Nooten; Marija Kliocaite; Walter Daelemans

bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch

Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Marija Kliocaite, Walter Daelemans

Abstract

Retrieval-augmented generation (RAG) systems can play an important role in making law more accessible. However, large and reliable resources for training and benchmarking such systems remain scarce, especially for under-resourced languages like Dutch. To address this gap, and building on previous work (Louis et al., 2024), we introduce bLLeQA, a bilingual parallel question-answering dataset grounded in Belgian legal resources, both in French and Dutch. The dataset contains aligned questions, answers, and supporting articles in both languages, enabling evaluation of both retrieval and end-to-end RAG pipelines. Using bLLeQA, we benchmark the full RAG pipeline in a zero-shot setting, covering retrieval, citation extraction, refusal behavior, and generation quality. Our experiments show that open-weight models are competitive with proprietary models in retrieval and citation extraction, but lag behind in generation quality in the RAG pipeline. Across all models, refusal capability remains weak, meaning that models do not reliably detect when the provided supporting sources are incomplete. In addition, the end-to-end RAG setup still yields a substantial share of flawed responses, reaching 20% even in the best-case scenario.

Anthology ID:: 2026.knowfm-1.4
Volume:: Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Canyu Chen, Yuji Zhang, Zoey Sha Li, Zihan Wang, Qineng Wang, Jinyan Su, Priyanka Kargupta, Sara Vera Marjanović, Jeff Z. Pan, Mohit Bansal, Isabelle Augenstein, Jiawei Han, Heng Ji, Manling Li
Venues:: KnowFM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34–59
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.4/
DOI:
Bibkey:
Cite (ACL):: Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Marija Kliocaite, and Walter Daelemans. 2026. bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch. In Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026), pages 34–59, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch (Banar et al., KnowFM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.4.pdf

PDF Cite Search Fix data