CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan, Noah Erdachew, Fnu Jayoti Devi, Joanna C. S. Santos, Marcos Zampieri


Abstract
Large language models (LLMs) are increasingly embedded in Computer Science (CS) classrooms to automate code generation, feedback, and assessment. However, their susceptibility to adversarial or ill-intentioned prompts threatens student learning and academic integrity. To cope with this important issue, we evaluate existing off-the-shelf LLMs in handling unsafe and irrelevant prompts within the domain of CS education. We identify important shortcomings in existing LLM guardrails which motivates us to propose CodeGuard, a comprehensive guardrail framework for educational AI systems. CodeGuard includes (i) a first-of-its-kind taxonomy for classifying prompts; (ii) the CodeGuard dataset, a collection of 8,000 prompts spanning the taxonomy; and (iii) PromptShield, a lightweight sentence-encoder model fine-tuned to detect unsafe prompts in real time. Experiments show that PromptShield achieves 0.93 F1 score, surpassing existing guardrail methods. Additionally, further experimentation reveals that CodeGuard reduces potentially harmful or policy-violating code completions by 30-65% without degrading performance on legitimate educational tasks. The code, datasets, and evaluation scripts are made freely available to the community.
Anthology ID:
2026.findings-eacl.48
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
937–949
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.48/
DOI:
Bibkey:
Cite (ACL):
Nishat Raihan, Noah Erdachew, Fnu Jayoti Devi, Joanna C. S. Santos, and Marcos Zampieri. 2026. CodeGuard: Improving LLM Guardrails in CS Education. In Findings of the Association for Computational Linguistics: EACL 2026, pages 937–949, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
CodeGuard: Improving LLM Guardrails in CS Education (Raihan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.48.pdf
Checklist:
 2026.findings-eacl.48.checklist.pdf