LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers
Christophe Friezas Gonçalves, Salima Lamsiyah, Christoph Schommer
Abstract
Reading comprehension resources for low-resource languages remain limited, particularly datasets designed for educational assessment and diagnostic analysis in contrast to binary correctness.We present a diagnostically rich reading comprehension corpus forLuxembourgish, annotated using a two-layer framework that separateslinguistic sources of textual difficulty from cognitive and diagnosticproperties of comprehension questions. The linguistic layer captures span-level lexical, syntactic, morphological, and discourse-related features, while the cognitive layerannotates multiple-choice questions according to the PIRLS cognitiveprocesses and diagnostically meaningful distractor types following theSTARC framework.This design enables fine-grained analysis of reading comprehensionerrors by linking response patterns to underlying linguistic phenomena. The resulting corpus consists of 640 multiple-choice questions based on 16 annotated Luxembourgish texts. We describe the annotation methodology agreement measures, and will releasethe dataset as a publicly available resource for educational andlow-resource NLP research.- Anthology ID:
- 2026.loreslm-1.46
- Volume:
- Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Hansi Hettiarachchi, Tharindu Ranasinghe, Alistair Plum, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
- Venue:
- LoResLM
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 532–541
- Language:
- URL:
- https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.46/
- DOI:
- Cite (ACL):
- Christophe Friezas Gonçalves, Salima Lamsiyah, and Christoph Schommer. 2026. LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers. In Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026), pages 532–541, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- LuxDiagRC: A Diagnostic Reading Comprehension Corpus for Luxembourgish with Linguistic and Cognitive Annotation Layers (Gonçalves et al., LoResLM 2026)
- PDF:
- https://preview.aclanthology.org/manual-author-scripts/2026.loreslm-1.46.pdf