@inproceedings{palermo-etal-2025-operational,
    title = "Operational Alignment of Confidence-Based Flagging Methods in Automated Scoring",
    author = "Palermo, Corey  and
      Chen, Troy  and
      Wibowo, Arianto",
    editor = "Wilson, Joshua  and
      Ormerod, Christopher  and
      Beiting Parrish, Magdalen",
    booktitle = "Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers",
    month = oct,
    year = "2025",
    address = "Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States",
    publisher = "National Council on Measurement in Education (NCME)",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-sessions.6/",
    pages = "56--60",
    ISBN = "979-8-218-84230-7",
    abstract = "Correct answers to math problems don{'}t reveal if students understand concepts or just memorized procedures. Conversation-Based Assessment (CBA) addresses this through AI dialogue, but reliable scoring requires costly pilots and specialized expertise. Our Criteria Development Platform (CDP) enables pre-pilot optimization using synthetic data, reducing development from months to days. Testing 17 math items through 68 iterations, all achieved our reliability threshold (MCC {\ensuremath{\geq}} 0.80) after refinement {--} up from 59{\%} initially. Without refinement, 7 items would have remained below this threshold. By making reliability validation accessible, CDP empowers educators to develop assessments meeting automated scoring standards."
}Markdown (Informal)
[Operational Alignment of Confidence-Based Flagging Methods in Automated Scoring](https://preview.aclanthology.org/ingest-emnlp/2025.aimecon-sessions.6/) (Palermo et al., AIME-Con 2025)
ACL
- Corey Palermo, Troy Chen, and Arianto Wibowo. 2025. Operational Alignment of Confidence-Based Flagging Methods in Automated Scoring. In Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Coordinated Session Papers, pages 56–60, Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States. National Council on Measurement in Education (NCME).