Can Language Models Guess Your Identity? Analyzing Demographic Biases in AI Essay Scoring

Alexander Kwako, Christopher Ormerod


Abstract
Large language models (LLMs) are increasingly used for automated scoring of student essays. However, these models may perpetuate societal biases if not carefully monitored. This study analyzes potential biases in an LLM (XLNet) trained to score persuasive student essays, based on data from the PERSUADE corpus. XLNet achieved strong performance based on quadratic weighted kappa, standardized mean difference, and exact agreement with human scores. Using available metadata, we performed analyses of scoring differences across gender, race/ethnicity, English language learning status, socioeconomic status, and disability status. Automated scores exhibited small magnifications of marginal differences in human scoring, favoring female students over males and White students over Black students. To further probe potential biases, we found that separate XLNet classifiers and XLNet hidden states weakly predicted demographic membership. Overall, results reinforce the need for continued fairness analyses as use of LLMs expands in education.
Anthology ID:
2024.bea-1.7
Volume:
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
78–86
Language:
URL:
https://aclanthology.org/2024.bea-1.7
DOI:
Bibkey:
Cite (ACL):
Alexander Kwako and Christopher Ormerod. 2024. Can Language Models Guess Your Identity? Analyzing Demographic Biases in AI Essay Scoring. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pages 78–86, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Can Language Models Guess Your Identity? Analyzing Demographic Biases in AI Essay Scoring (Kwako & Ormerod, BEA 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.bea-1.7.pdf