Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Grandee Lee; Yue Wang; Che Yee Lye; Luke Peh

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Grandee Lee, Yue Wang, Che Yee Lye, Luke Peh

Abstract

When the same LLM generates assessment items, simulates student responses, and scores them, the validation loop is self-referential. We introduce Generative-Evaluative Agreement (GEA), a validity criterion measuring whether an LLM’s scoring function recovers the skill levels its generative function was instructed to produce. In the first direct measurement of GEA on a two-stage adaptive assessment, the model recovers roughly half the intended variance (r = 0.698) with systematic positive bias. GEA is strong (r > 0.7) for syntactically verifiable skills but near zero for design-level skills, and low-skill overestimation inflates scores near the routing threshold. We argue that granular, skill-decomposed rubrics are the principal proposed mechanism for strengthening GEA and outline complementary mitigations.

Anthology ID:: 2026.bea-1.54
Volume:: Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 798–812
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.54/
DOI:
Bibkey:
Cite (ACL):: Grandee Lee, Yue Wang, Che Yee Lye, and Luke Peh. 2026. Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 798–812, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment (Lee et al., BEA 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.54.pdf

PDF Cite Search Fix data