Danielle S. McNamara
Also published as: Danielle McNamara, Danielle S McNamara
2026
ORSO QGen: Odds-Ratio Steerable Optimization for Controlling Question Generation
Andreea Dutulescu | Stefan Ruseti | Mihai Dascalu | Danielle S. McNamara
Findings of the Association for Computational Linguistics: EACL 2026
Andreea Dutulescu | Stefan Ruseti | Mihai Dascalu | Danielle S. McNamara
Findings of the Association for Computational Linguistics: EACL 2026
Question generation plays an important role in educational applications, enabling automated assessment and reading comprehension support. Attribute-controlled question generation aims to produce questions that fit predefined characteristics such as difficulty, focus, or coverage. Existing methods predominantly rely on supervised fine-tuning, which often fails to impose a strong adherence to attribute values, resulting in weak coupling between prompt specifications and model outputs. We introduce Odds-Ratio Steerable Optimization (ORSO), a framework designed to enhance attribute sensitivity in question generation models. Building upon preference-based learning techniques without requiring human-curated preference sets, ORSO employs input-level perturbations to create contrastive training signals. Empirical evaluations on both exhaustive and expert-validated attribute configurations indicate that ORSO performs better in enforcing attribute conformity while maintaining output quality. These results argue for the benefits of explicit attribute-aware optimization in controllable question generation tasks.
Modeling Writing Development as Coordinated Change Across Linguistic and Semantic Dimensions
Michelle Banawan | Andrew Potter | Tracy Arner | Danielle S McNamara
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Michelle Banawan | Andrew Potter | Tracy Arner | Danielle S McNamara
Proceedings of the 1st Workshop on Computational Developmental Linguistics (CDL)
Writing development is often assessed through aggregate improvements in surface-level features, yet less attention has been given to how multiple linguistic dimensions evolve jointly over time. We model writing development as a multidimensional system shaped by stable individual variation and instructional progression across staged assignments, using interpretable linguistic features from the Writing Analytics Toolkit (WAT) and transformer-based sentence embeddings.Variance partitioning reveals substantial between-student stability alongside stage-dependent change. Mixed-effects models identify non-uniform developmental trajectories: academic focus, information density, and conventional language increase, whereas development of ideas and lexical variety decline, indicating tradeoffs across competing dimensions. Cross-lagged analyses further show dynamic dependencies between dimensions, suggesting coordinated change rather than independent progression.Embedding-based analyses capture stage-dependent shifts in semantic representation, with larger changes in earlier stages and increasing stability over time. Although assignment structure contributes to observed variation, stable individual differences and cross-stage dependencies indicate underlying developmental processes that generalize across tasks.Together, these findings characterize writing development as structured change in a multidimensional representational system, highlighting the need for computational models that capture stable variation, non-monotonic trajectories, and interactions among linguistic components.
EduMUSE: A Multimodal Educational Dataset with Automatically Extracted Instructional Context
Andreea Dutulescu | Stefan Ruseti | Mihai Dascalu | Danielle McNamara
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Andreea Dutulescu | Stefan Ruseti | Mihai Dascalu | Danielle McNamara
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Research in AI applied to education increasingly relies on large-scale, high-quality datasets to support the development and evaluation of learning analytics and intelligent educational systems. Open educational resources provide a promising foundation, yet few datasets integrate structured instructional content with assessment materials in a multimodal form. In this study, we introduce a large-scale multimodal educational dataset (EduMUSE - Educational Multimodal Understanding & Solution Dataset) constructed from OpenStax undergraduate textbooks across multiple domains. The dataset integrates hierarchically structured instructional text, figures, exercises, and, when available, official solutions. For exercises with solutions, we introduce an automatic method that associates each exercise with a focused instructional subsection rather than entire textbook chapters, estimating subsection relevance via solution likelihood under candidate contexts using a vision–language model. We analyze the impact of contextualization on the behavior of vision–language models across different contexts. Results indicate that subsection-level instructional context has a measurable impact on model performance, with variation across model scales and task formulations. The dataset and code are released as open source at https://github.com/upb-nlp/BEA-EduMUSE/ to support reproducible research in multimodal educational modeling and to facilitate generating similar datasets using our approach.
2013
Native Language Identification: A Key N-gram Category Approach
Kristopher Kyle | Scott Crossley | Jianmin Dai | Danielle S. McNamara
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
Kristopher Kyle | Scott Crossley | Jianmin Dai | Danielle S. McNamara
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications