bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch
Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Marija Kliocaite, Walter Daelemans
Abstract
Retrieval-augmented generation (RAG) systems can play an important role in making law more accessible. However, large and reliable resources for training and benchmarking such systems remain scarce, especially for under-resourced languages like Dutch. To address this gap, and building on previous work (Louis et al., 2024), we introduce bLLeQA, a bilingual parallel question-answering dataset grounded in Belgian legal resources, both in French and Dutch. The dataset contains aligned questions, answers, and supporting articles in both languages, enabling evaluation of both retrieval and end-to-end RAG pipelines. Using bLLeQA, we benchmark the full RAG pipeline in a zero-shot setting, covering retrieval, citation extraction, refusal behavior, and generation quality. Our experiments show that open-weight models are competitive with proprietary models in retrieval and citation extraction, but lag behind in generation quality in the RAG pipeline. Across all models, refusal capability remains weak, meaning that models do not reliably detect when the provided supporting sources are incomplete. In addition, the end-to-end RAG setup still yields a substantial share of flawed responses, reaching 20% even in the best-case scenario.- Anthology ID:
- 2026.knowfm-1.4
- Volume:
- Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Canyu Chen, Yuji Zhang, Zoey Sha Li, Zihan Wang, Qineng Wang, Jinyan Su, Priyanka Kargupta, Sara Vera Marjanović, Jeff Z. Pan, Mohit Bansal, Isabelle Augenstein, Jiawei Han, Heng Ji, Manling Li
- Venues:
- KnowFM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–59
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.4/
- DOI:
- Cite (ACL):
- Nikolay Banar, Ehsan Lotfi, Jens Van Nooten, Marija Kliocaite, and Walter Daelemans. 2026. bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch. In Proceedings of the 4th Workshop on Towards Knowledgeable Foundation Models (KnowFM 2026), pages 34–59, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- bLLeQA: Benchmarking LLMs for Grounded Legal Question-Answering in French and Dutch (Banar et al., KnowFM 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.knowfm-1.4.pdf