McMining: Automated Discovery of Misconceptions in Student Code

Erfan Al-Hossami, Razvan Bunescu


Abstract
When learning to code, students often develop misconceptions about various programming language concepts. These can not only lead to bugs or inefficient code, but also slow down the learning of related concepts. In this paper, we introduce McMining, the task of mining programming misconceptions from samples of code from a student. To enable the training and evaluation of McMining systems, we develop an extensible benchmark dataset of misconceptions, together with a large set of code samples where these misconceptions are manifested. We then introduce two LLM-based McMiner approaches and, through extensive evaluations, show that models from the Gemini, Claude, and GPT families are effective at discovering misconceptions in student code.
Anthology ID:
2026.eacl-short.10
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
160–178
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.10/
DOI:
Bibkey:
Cite (ACL):
Erfan Al-Hossami and Razvan Bunescu. 2026. McMining: Automated Discovery of Misconceptions in Student Code. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 160–178, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
McMining: Automated Discovery of Misconceptions in Student Code (Al-Hossami & Bunescu, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.10.pdf
Checklist:
 2026.eacl-short.10.checklist.pdf