Jongwoo Kim
2026
FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback
SeongYeub Chu | Jongwoo Kim | Mun Yong Yi
Findings of the Association for Computational Linguistics: ACL 2026
SeongYeub Chu | Jongwoo Kim | Mun Yong Yi
Findings of the Association for Computational Linguistics: ACL 2026
Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We release our code and curated datasets at: https://github.com/BBeeChu/FeedEval.git.
2025
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing
Jongwoo Kim | SeongYeub Chu | Bryan Wong | Mun Yong Yi
Findings of the Association for Computational Linguistics: EMNLP 2025
Jongwoo Kim | SeongYeub Chu | Bryan Wong | Mun Yong Yi
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have introduced new prompt formats, they struggle to reflect the histories of example learners within a single prompt during in-context learning (ICL), leading to limited scalability and high computational cost under token constraints. In this work, we present LLM-based Option weighted Knowledge Tracing (LOKT), a simple yet effective LLM-based knowledge tracing framework that encodes the interaction histories of example learners in context as textual categorical option weights (TCOW). These are semantic labels (e.g., “inadequate”) assigned to the options selected by learners when answering questions helping understand LLM. Experiments on multiple-choice datasets show that LOKT outperforms existing LLM-based KT models in both warm-start and few-shot settings. Moreover, LOKT enables scalable and cost-efficient inference, performing strongly even under strict token constraints. Our code is available at https://anonymous.4open.science/r/LOKT_model-3233