Abstract
Generalizability Theory with entropy-derived stratification optimized automated essay scoring reliability. A G-study decomposed variance across 14 encoders and 3 seeds; D-studies identified minimal ensembles achieving G ≥ 0.85. A hybrid of one medium and one small encoder with two seeds maximized dependability per compute cost. Stratification ensured uniform precision across- Anthology ID:
- 2025.aimecon-main.34
- Volume:
- Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers
- Month:
- October
- Year:
- 2025
- Address:
- Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States
- Editors:
- Joshua Wilson, Christopher Ormerod, Magdalen Beiting Parrish
- Venue:
- AIME-Con
- SIG:
- Publisher:
- National Council on Measurement in Education (NCME)
- Note:
- Pages:
- 312–328
- Language:
- URL:
- https://preview.aclanthology.org/more-markup/2025.aimecon-main.34/
- DOI:
- Cite (ACL):
- Yi Gui. 2025. From Entropy to Generalizability: Strengthening Automated Essay Scoring Reliability and Sustainability. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers, pages 312–328, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).
- Cite (Informal):
- From Entropy to Generalizability: Strengthening Automated Essay Scoring Reliability and Sustainability (Gui, AIME-Con 2025)
- PDF:
- https://preview.aclanthology.org/more-markup/2025.aimecon-main.34.pdf