The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors
Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, Kyle Lo
Abstract
Effective mathematics education requires identifying and responding to students’ mistakes. For AI to support pedagogical applications, models must perform well across different levels of student proficiency. Our work provides an extensive, year-long snapshot of how 11 vision-language models (VLMs) perform on DrawEduMath, a QA benchmark involving real students’ handwritten, hand-drawn responses to math problems. We find that models’ weaknesses concentrate on a core component of math education: student error. All evaluated VLMs underperform when describing work from students who may require more pedagogical help, and across all QA, they struggle the most on questions related to assessing student error. Thus, while VLMs may be optimized to be math problem solving experts, our results suggest that they require alternative development incentives to adequately support educational use cases.- Anthology ID:
- 2026.bea-1.5
- Volume:
- Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
- Venues:
- BEA | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 48–62
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.5/
- DOI:
- Cite (ACL):
- Li Lucy, Albert Zhang, Nathan Anderson, Ryan Knight, and Kyle Lo. 2026. The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 48–62, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors (Lucy et al., BEA 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.5.pdf