Linda Kupfer

2026

WWTC@UniA at SemEval-2026 Task 13: BERT-based Code Authorship Detection and Qualitative Analysis
Linda Kupfer | Lisa Hader | Christian Jaumann | Annemarie Friedrich
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes our system for SemEval-2026 Task 13 on detecting machine-generated code. We fine-tune small encoder-only models for detecting human-written versus machine-generated code and for identifying which large language model (LLM) family was used to obtain code. We find that a strong, general-purpose model (ModernBERT) outperforms models specifically pre-trained for the code domain. In the official evaluation, our system ranks 5th on subtask B and 6th on subtask C. Our detailed analysis reveals that comments and other natural language text that is part of the code snippets provide valuable information for identifying the LLM family that generated it. Moreover, we show that the embeddings of our finetuned ModernBERT do not distinguish well between LLM families, but they cluster human-written code by programming language.

Co-authors

Venues

SemEval1
WS1

Fix author