EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context

Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara

EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context

Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu, Danielle McNamara

Abstract

Research in AI applied to education increasingly relies on large-scale, high-quality datasets to support the development and evaluation of learning analytics and intelligent educational systems. Open educational resources provide a promising foundation, yet few datasets integrate structured instructional content with assessment materials in a multimodal form. In this study, we introduce a large-scale multimodal educational dataset (EduMUSE - Educational Multimodal Understanding & Solution Dataset) constructed from OpenStax undergraduate textbooks across multiple domains. The dataset integrates hierarchically structured instructional text, figures, exercises, and, when available, official solutions. For exercises with solutions, we introduce an automatic method that associates each exercise with a focused instructional subsection rather than entire textbook chapters, estimating subsection relevance via solution likelihood under candidate contexts using a vision–language model. We analyze the impact of contextualization on the behavior of vision–language models across different contexts. Results indicate that subsection-level instructional context has a measurable impact on model performance, with variation across model scales and task formulations. The dataset and code are released as open source at https://github.com/upb-nlp/BEA-EduMUSE/ to support reproducible research in multimodal educational modeling and to facilitate generating similar datasets using our approach.

Anthology ID:: 2026.bea-1.25
Volume:: Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 358–367
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.25/
DOI:
Bibkey:
Cite (ACL):: Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu, and Danielle McNamara. 2026. EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 358–367, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context (Dutulescu et al., BEA 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.25.pdf

PDF Cite Search Fix data