Rone Brandao Filho
2026
AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code
Rone Brandao Filho | Walcy Santos Rezende Rios | Lucas Neves | Jose Ricardo Fleury Oliveira | Diogo Fernandes | Arlindo Galvão Filho
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Rone Brandao Filho | Walcy Santos Rezende Rios | Lucas Neves | Jose Ricardo Fleury Oliveira | Diogo Fernandes | Arlindo Galvão Filho
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
The widespread use of LLMs in software development has made the detection of machine-generated code a pressing challenge, particularly when models must generalize across programming languages and domains. We present a lightweight, LLM-free pipeline that combines stylometric feature extraction with a LightGBM classifier and explicitly prioritizes structural generalization over deep semantic modeling. Despite its simplicity, the method achieves a Macro F1 of 0.70–0.72, more than doubling the CodeBERT baseline (0.30) in SemEval-2026 Task 13 Subtask A, while operating without GPUs or any fine-tuning.