Mun Yong Yi

2026

FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback
SeongYeub Chu | Jongwoo Kim | Mun Yong Yi
Findings of the Association for Computational Linguistics: ACL 2026

Going beyond the prediction of numerical scores, recent research in automated essay scoring has increasingly emphasized the generation of high-quality feedback that provides justification and actionable guidance. To mitigate the high cost of expert annotation, prior work has commonly relied on LLM-generated feedback to train essay assessment models. However, such feedback is often incorporated without explicit quality validation, resulting in the propagation of noise in downstream applications. To address this limitation, we propose FeedEval, an LLM-based framework for evaluating LLM-generated essay feedback along three pedagogically grounded dimensions: specificity, helpfulness, and validity. FeedEval employs dimension-specialized LLM evaluators trained on datasets curated in this study to assess multiple feedback candidates and select high-quality feedback for downstream use. Experiments on the ASAP++ benchmark show that FeedEval closely aligns with human expert judgments and that essay scoring models trained with FeedEval-filtered high-quality feedback achieve superior scoring performance. Furthermore, revision experiments using small LLMs show that the high-quality feedback identified by FeedEval leads to more effective essay revisions. We release our code and curated datasets at: https://github.com/BBeeChu/FeedEval.git.

pdf bib abs

Distilling LLM Reasoning into Dense Encoders: Bridging the Accuracy-Efficiency Gap in Recommendation
Donghee Han | Daeyoung Roh | A Young Kim | Hwanjun Song | Mun Yong Yi
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) have shown remarkable potential in recommendation systems but suffer from prohibitive inference latency. Existing distillation approaches typically target Small Language Models (SLMs) or Conventional Recommendation Models (CRMs), face a critical trade-off between computational cost and semantic reasoning capacity. To bridge this accuracy-efficiency gap, we introduce Reasoning-to-Encoder Distillation (R2END), a framework that establishes a text encoder as the optimal student architecture for scalable recommendation. Unlike methods that mimic token generation, R2END compresses the teacher’s reasoning into a dense vector space via a semantic alignment objective, effectively capturing user-item dynamics. Extensive experiments on four datasets demonstrate that R2END not only outperforms state-of-the-art baselines but also achieves drastically reduced latency, offering a sweet spot for recommendation.

2025

pdf bib abs

Leveraging LLM-Generated Schema Descriptions for Unanswerable Question Detection in Clinical Data
Donghee Han | Seungjae Lim | Daeyoung Roh | Sangryul Kim | Sehyun Kim | Mun Yong Yi
Proceedings of the 31st International Conference on Computational Linguistics

Recent advancements in large language models (LLMs) have boosted research on generating SQL queries from domain-specific questions, particularly in the medical domain. A key challenge is detecting and filtering unanswerable questions. Existing methods often relying on model uncertainty, but these require extra resources and lack interpretability. We propose a lightweight model that predicts relevant database schemas to detect unanswerable questions, enhancing interpretability and addressing the data imbalance in binary classification tasks. Furthermore, we found that LLM-generated schema descriptions can significantly enhance the prediction accuracy. Our method provides a resource-efficient solution for unanswerable question detection in domain-specific question answering systems.

pdf bib abs

Rethinking LLM-Based Recommendations: A Personalized Query-Driven Parallel Integration
Donghee Han | Hwanjun Song | Mun Yong Yi
Findings of the Association for Computational Linguistics: EMNLP 2025

Recent studies have explored integrating large langucage models (LLMs) into recommendation systems but face several challenges, including training-induced bias and bottlenecks from serialized architecture.To effectively address these issues, we propose a Query-to-Recommendation, a parallel recommendation framework that decouples LLMs from candidate pre-selection and instead enables direct retrieval over the entire item pool. Our framework connects LLMs and recommendation models in a parallel manner, allowing each component to independently utilize its strengths without interfering with the other. In this framework, LLMs are utilized to generate feature-enriched item descriptions and personalized user queries, allowing for capturing diverse preferences and enabling rich semantic matching in a zero-shot manner. To effectively combine the complementary strengths of LLM and collaborative signals, we introduce an adaptive reranking strategy. Extensive experiments demonstrate an improvement in performance up to 57%, while also improving the novelty and diversity of recommendations.

pdf bib abs

Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing
Jongwoo Kim | SeongYeub Chu | Bryan Wong | Mun Yong Yi
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) have recently emerged as promising tools for knowledge tracing due to their strong reasoning and generalization abilities. While recent LLM-based KT methods have introduced new prompt formats, they struggle to reflect the histories of example learners within a single prompt during in-context learning (ICL), leading to limited scalability and high computational cost under token constraints. In this work, we present LLM-based Option weighted Knowledge Tracing (LOKT), a simple yet effective LLM-based knowledge tracing framework that encodes the interaction histories of example learners in context as textual categorical option weights (TCOW). These are semantic labels (e.g., “inadequate”) assigned to the options selected by learners when answering questions helping understand LLM. Experiments on multiple-choice datasets show that LOKT outperforms existing LLM-based KT models in both warm-start and few-shot settings. Moreover, LOKT enables scalable and cost-efficient inference, performing strongly even under strict token constraints. Our code is available at https://anonymous.4open.science/r/LOKT_model-3233

pdf bib abs

Rationale Behind Essay Scores: Enhancing S-LLM’s Multi-Trait Essay Scoring with Rationale Generated by LLMs
SeongYeub Chu | Jong Woo Kim | Bryan Wong | Mun Yong Yi
Findings of the Association for Computational Linguistics: NAACL 2025

Existing automated essay scoring (AES) has solely relied on essay text without using explanatory rationales for the scores, thereby forgoing an opportunity to capture the specific aspects evaluated by rubric indicators in a fine-grained manner. This paper introduces Rationale-based Multiple Trait Scoring (RMTS), a novel approach for multi-trait essay scoring that integrates prompt-engineering-based large language models (LLMs) with a fine-tuning-based essay scoring model using a smaller large language model (S-LLM). RMTS uses an LLM-based trait-wise rationale generation system where a separate LLM agent generates trait-specific rationales based on rubric guidelines, which the scoring model uses to accurately predict multi-trait scores. Extensive experiments on benchmark datasets, including ASAP, ASAP++, and Feedback Prize, show that RMTS significantly outperforms state-of-the-art models and vanilla S-LLMs in trait-specific scoring. By assisting quantitative assessment with fine-grained qualitative rationales, RMTS enhances the trait-wise reliability, providing partial explanations about essays. The code is available at https://github.com/BBeeChu/RMTS.git.

Mun Yong Yi

2026

2025

2014

Co-authors

Venues