Joshua Wilson
2025
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers
Joshua Wilson | Christopher Ormerod | Magdalen Beiting Parrish
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers
Evaluating LLM-Based Automated Essay Scoring: Accuracy, Fairness, and Validity
Yue Huang | Joshua Wilson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
Yue Huang | Joshua Wilson
Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Works in Progress
This study evaluates large language models (LLMs) for automated essay scoring (AES), comparing prompt strategies and fairness across student groups. We found that well-designed prompting helps LLMs approach traditional AES performance, but both differ from human scores for ELLs—the traditional model shows larger overrall gaps, while LLMs show subtler disparities.