Allison Bradford


2024

pdf
Building Robust Content Scoring Models for Student Explanations of Social Justice Science Issues
Allison Bradford | Kenneth Steimel | Brian Riordan | Marcia Linn
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

With increased attention to connecting science topics to real-world contexts, like issues of social justice, teachers need support to assess student progress in explaining such issues. In this work, we explore the robustness of NLP-based automatic content scoring models that provide insight into student ability to integrate their science and social justice ideas in two different environmental science contexts. We leverage encoder-only transformer models to capture the degree to which students explain a science phenomenon, understand the intersecting justice issues, and integrate their understanding of science and social justice. We developed models training on data from each of the contexts as well as from a combined dataset. We found that the models developed in one context generate educationally useful scores in the other context. The model trained on the combined dataset performed as well as or better than the models trained on separate datasets in most cases. Quadratic weighted kappas demonstrate that these models are above threshold for use in classrooms.

2020

pdf
An empirical investigation of neural methods for content scoring of science explanations
Brian Riordan | Sarah Bichler | Allison Bradford | Jennifer King Chen | Korah Wiley | Libby Gerard | Marcia C. Linn
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

With the widespread adoption of the Next Generation Science Standards (NGSS), science teachers and online learning environments face the challenge of evaluating students’ integration of different dimensions of science learning. Recent advances in representation learning in natural language processing have proven effective across many natural language processing tasks, but a rigorous evaluation of the relative merits of these methods for scoring complex constructed response formative assessments has not previously been carried out. We present a detailed empirical investigation of feature-based, recurrent neural network, and pre-trained transformer models on scoring content in real-world formative assessment data. We demonstrate that recent neural methods can rival or exceed the performance of feature-based methods. We also provide evidence that different classes of neural models take advantage of different learning cues, and pre-trained transformer models may be more robust to spurious, dataset-specific learning cues, better reflecting scoring rubrics.