Xing Wei

Other people with similar names: Xing Wei

Unverified author pages with similar names: Xing Wei

2026

EduMARS: Can Vision-Language Models Grade Like Teachers? Benchmarking Multimodal, Rubric-Based Assessment on Chinese K-12 Answers
Xuan Zhao | Jiashun Chen | Wanting xu | Huiyuan Yan | Chaowei Fang | Xing Wei
Findings of the Association for Computational Linguistics: ACL 2026

Automated grading of student work is a critical application of AI in education. However, existing benchmarks fall short in evaluating models on realistic, cognitively demanding tasks. Most rely on synthetic, well-structured text inputs, overlooking the multimodal, error-prone, and often handwritten nature of real student responses, especially in K-12 settings. We introduce EduMARS, a multimodal benchmark designed for rubric-aligned evaluation of real Chinese K-12 student answers. The dataset contains over 4,500 authentic responses from high-stakes exams across eight subjects, featuring noisy handwriting,mixed-layout diagrams,mathematical expressions, and narrative reasoning. Each response is meticulously annotated by expert teachers using step-wise scoring rubrics, error classifications, and key-point mappings, providing fine-grained supervision aligned with real-world pedagogical practices. We evaluated existing SOTA MLLMs across the dimensions of final score and the reasoning process of grading, reveals a significant gap between existing SOTA MLLMs and human-level performance. To bridge this performance gap, we propose the Retrieval-Augmented Adaptive-Rubric Grading (RARG), enabling models to emulate expert grading logic by dynamically synthesizing case-specific evaluation schemas. RARG effectively enhances the performance and interpretability of various MLLMs on EduMARS, surpassing in-context learning and chain-of-thought.

Co-authors

Venues

Findings1

Fix author