Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback

Charles Koutcheme; Nicola Dainese; Arto Hellas

Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback

Charles Koutcheme, Nicola Dainese, Arto Hellas

Abstract

Locally deployed Small Language Models (SLMs) offer a promising solution for providing timely and effective programming feedback to students learning to code. However, SLMs often produce misleading or hallucinated feedback, limiting their reliability in educational settings. Current approaches for improving SLM feedback rely on existing human annotations or LLM-generated feedback. This paper addresses a fundamental challenge: Can we improve SLMs’ feedback capabilities without relying on human or LLM-generated annotations? We demonstrate that training SLMs on the proxy task of program repair is sufficient to enhance their ability to generate high-quality feedback. To this end, we introduce Direct Repair Optimization (DRO), a self-supervised online reinforcement learning strategy that trains language models to reason about how to efficiently fix students’ programs.Our experiments, using DRO to fine-tune LLaMA-3.1–3B and Qwen-2.5–3B on a large-scale dataset of Python submissions from real students, show substantial improvements on downstream feedback tasks. We release our code to support further research in educational feedback and highlight promising directions for future work.

Anthology ID:: 2025.bea-1.41
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 564–581
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.41/
DOI:
Bibkey:
Cite (ACL):: Charles Koutcheme, Nicola Dainese, and Arto Hellas. 2025. Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 564–581, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback (Koutcheme et al., BEA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.41.pdf

PDF Cite Search Fix data