How Yu

2026

HyperparameterOmens at SemEval-2026 Task 13: Various approaches to detecting machine- generated code
Dmitry Sukhotin | How Yu
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We present our systems for SemEval-2026 Task 13, built on the Droid resource suite and benchmark setting. For Subtask A (binary classification of human-written vs. machine-generated code), lexical baselines such as TF–IDF and character n-grams transferred poorly from the LeetCode training distribution to the production-code evaluation split. After correcting pipeline errors that obscured true performance and selecting stable AST features under domain shift, our final system uses 5 uncorrelated features and achieves 0.57 macro F1 on the public test set.For Subtask C (4-way authorship classification of human, AI, hybrid, and adversarial) lexical baselines performed poorly under a significant vocabulary shift. Deep semantic models proved more promising, and a per-class weighted ensemble which included these models achieved 0.57 macro F1 on the public test set

Co-authors

Dmitry Sukhotin 1

Venues

SemEval1
WS1

Fix author