PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs

Ravi Kumar, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou


Abstract
Large language models (LLMs) can provide automated feedback in educational settings, but aligning an LLM’s style with a specific instructor’s tone while maintaining diagnostic correctness remains challenging. We ask: how can we update an LLM for automated feedback generation to align with a target instructor’s style without sacrificing core knowledge? We study how Reinforcement Learning from Human Feedback (RLHF) can adapt a transformer-based LLM to generate programming feedback that matches a professor’s grading voice. We introduce PERSA, an RLHF pipeline that combines supervised fine-tuning on professor demonstrations, reward modeling from pairwise preferences, and Proximal-based policy optimization, while deliberately constraining learning to style-bearing components.Motivated by analyses of transformer internals, PERSA applies parameter efficient fine-tuning. It updates only the top transformer blocks and their feed-forward projections, minimizing global parameter drift while increasing stylistic controllability. We evaluate our proposed approach on three code-feedback benchmarks (APPS, PyFiXV, and CodeReviewQA) using complementary metrics for style alignment and fidelity. Across both Llama-3 and Gemma-2 backbones, PERSA delivers the strongest professor-style transfer while preserving perfect correctness; for example on APPS, it boosts Style Alignment Score (SAC) to 96.2% (from 34.8% for Base) with Correctness Accuracy (CA) up to 100% on Llama-3, and Gemma-2. Overall, PERSA offers a practical route to personalized educational feedback by aligning both what it says (content correctness) and, crucially, how it says it (instructor-like tone, structure, and guidance).
Anthology ID:
2026.bea-1.37
Volume:
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
529–545
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.37/
DOI:
Bibkey:
Cite (ACL):
Ravi Kumar, Utkarsh Grover, Xiaomin Lin, and Agoritsa Polyzou. 2026. PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 529–545, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs (Kumar et al., BEA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.37.pdf