@inproceedings{kumar-etal-2026-persa,
title = "{PERSA}: Reinforcement Learning for Professor-Style Personalized Feedback with {LLM}s",
author = "Kumar, Ravi and
Grover, Utkarsh and
Lin, Xiaomin and
Polyzou, Agoritsa",
editor = "Kochmar, Ekaterina and
Alhafni, Bashar and
Bann{\`o}, Stefano and
Bexte, Marie and
Burstein, Jill and
Horbach, Andrea and
Laarmann-Quante, Ronja and
Tack, Anais and
Yaneva, Victoria and
Yuan, Zheng",
booktitle = "Proceedings of the 21st Workshop on Innovative Use of {NLP} for Building Educational Applications ({BEA} 2026)",
month = jul,
year = "2026",
address = "San Diego, California, USA",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.37/",
pages = "529--545",
ISBN = "979-8-89176-409-5",
abstract = "Large language models (LLMs) can provide automated feedback in educational settings, but aligning an LLM{'}s style with a specific instructor{'}s tone while maintaining diagnostic correctness remains challenging. We ask: how can we update an LLM for automated feedback generation to align with a target instructor{'}s style without sacrificing core knowledge? We study how Reinforcement Learning from Human Feedback (RLHF) can adapt a transformer-based LLM to generate programming feedback that matches a professor{'}s grading voice. We introduce PERSA, an RLHF pipeline that combines supervised fine-tuning on professor demonstrations, reward modeling from pairwise preferences, and Proximal-based policy optimization, while deliberately constraining learning to style-bearing components.Motivated by analyses of transformer internals, PERSA applies parameter efficient fine-tuning. It updates only the top transformer blocks and their feed-forward projections, minimizing global parameter drift while increasing stylistic controllability. We evaluate our proposed approach on three code-feedback benchmarks (APPS, PyFiXV, and CodeReviewQA) using complementary metrics for style alignment and fidelity. Across both Llama-3 and Gemma-2 backbones, PERSA delivers the strongest professor-style transfer while preserving perfect correctness; for example on APPS, it boosts Style Alignment Score (SAC) to 96.2{\%} (from 34.8{\%} for Base) with Correctness Accuracy (CA) up to 100{\%} on Llama-3, and Gemma-2. Overall, PERSA offers a practical route to personalized educational feedback by aligning both what it says (content correctness) and, crucially, how it says it (instructor-like tone, structure, and guidance)."
}