Adil Bahaj
2026
MizanQA: A Benchmark for Multi-Answer Moroccan Legal QA
Adil Bahaj | Mounir Ghogho
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Adil Bahaj | Mounir Ghogho
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
We present MizanQA, a benchmark for assessing LLMs on Moroccan legal MCQs, many with multiple correct answers. Covering 1,776 expert-verified questions in Modern Standard Arabic enriched with Moroccan idioms, the dataset reflects influences from Maliki jurisprudence, customary law, and French legal traditions. Unlike single-answer settings, MizanQA features variable option counts, creating added difficulty. We evaluate multilingual and Arabic-centric models in zero-shot, native-Arabic prompts, measuring accuracy, a precision-penalized F1-like score, and calibration errors. Results show large performance gaps and miscalibration, particularly under stricter penalties. By scoping this benchmark to parametric knowledge only, we provide a baseline for future retrieval-augmented and rationale-focused setups.