Nico Andersen
2026
Rubrics as Semantic Subspaces: A Unified Approach to Rubric-based Constructed Response Scoring across Short Answers and Essays
Sebastian Gombert | Sonja Hahn | Nico Andersen | Leon Camus | Zhifan Sun | Ngoc Nhu Hao Nguyen | Fabian Zehner | Longwei Cong | Alexander Mehler | Hendrik Drachsler
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Sebastian Gombert | Sonja Hahn | Nico Andersen | Leon Camus | Zhifan Sun | Ngoc Nhu Hao Nguyen | Fabian Zehner | Longwei Cong | Alexander Mehler | Hendrik Drachsler
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Rubrics are the primary reference for manual scoring of constructed responses, and there is growing interest in their use in automated scoring methodologies. In this work, we propose Aspect-Grounded Rubric–Answer Alignment (AGRAA), a rubric-based end-to-end scoring framework that models rubric descriptors as latent aspect spaces. Concretely, rubric descriptors are represented as low-dimensional subspaces derived from contextualised transformer embeddings, and student responses are scored according to how strongly their representations align with these rubric-induced spaces relative to the residual space outside them. This formulation provides a geometrically grounded interpretation of rubric-based scoring while enabling end-to-end training with standard transformer encoders. We introduce three distinct architectural variants and evaluate them on multiple short-answer and essay scoring datasets. Across these tasks, AGRAA achieves predictive performance highly competitive with strong neural and feature-based baselines. In addition, the framework yields interpretable intermediate representations that expose which rubric-defined aspects contribute to scoring decisions, enabling decision-aligned explanations grounded in rubric descriptors.
2025
Down the Cascades of Omethi: Hierarchical Automatic Scoring in Large-Scale Assessments
Fabian Zehner | Hyo Jeong Shin | Emily Kerzabi | Andrea Horbach | Sebastian Gombert | Frank Goldhammer | Torsten Zesch | Nico Andersen
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Fabian Zehner | Hyo Jeong Shin | Emily Kerzabi | Andrea Horbach | Sebastian Gombert | Frank Goldhammer | Torsten Zesch | Nico Andersen
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
We present the framework Omethi, which is aimed at scoring short text responses in a semi-automatic fashion, particularly fit to international large-scale assessments. We evaluate its effectiveness for the massively multilingual PISA tests. Responses are passed through a conditional flow of hierarchically combined scoring components to assign a score. Once a score is assigned, hierarchically lower components are discarded. Models implemented in this study ranged from lexical matching of normalized texts—with excellent accuracy but weak generalizability—to fine-tuned large language models—with lower accuracy but high generalizability. If not scored by any automatic component, responses are passed on to manual scoring. The paper is the first to provide an evaluation of automatic scoring on multilingual PISA data in eleven languages (including Arabic, Finnish, Hebrew, and Kazakh) from three domains (n = 3.8 million responses). On average, results show a manual effort reduction of 71 percent alongside an agreement of 𝜅 = .957, when including manual scoring, and 𝜅 = .804 for only the automatically scored responses. The evaluation underscores the framework’s effective adaptivity and operational feasibility with its shares of used components varying substantially across domains and languages while maintaining homogeneously high accuracy.