Vedajanaani R S

2026

MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes Team MedHastra’s submission to SemEval-2026 Task 13 on detecting machine-generated code across diverse programming languages, generators, and application scenarios. We participated in all three subtasks: (A) binary detection of AI-generated code under out-of-distribution conditions, (B) multi-class attribution across ten large language model families, and (C) classification of human, fully AI-generated, hybrid, and adversarial code.For Subtask A, we implemented a stylometric ensemble combining structural formatting features with word- and character-level TF-IDF representations, trained using Random Forest, Gradient Boosting, and Logistic Regression with soft voting. For Subtasks B and C, we fine-tuned CodeBERT to leverage contextual code representations, incorporating class balancing strategies such as downsampling and weighted cross-entropy.Our results demonstrate that handcrafted stylometric features struggle under strong distribution shift, while transformer-based contextual modeling is more effective for fine-grained attribution and hybrid/adversarial detection. The study highlights the importance of robust contextual representations for realistic AI-assisted programming scenarios.

pdf bib abs

MedHastra@DravidianLangTech 2026: Piecewise Style Classification for Telugu Prompt Recovery Using XLM-RoBERTa
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

We present a system for the DravidianLangTech @ ACL 2026 shared task on TeluguPrompt-Style Recovery(B et al., 2026). The task requires classifying Telugu text into one of nine communicative styles: Formal, Informal, Optimistic, Pessimistic, Humorous, Serious, Inspiring, Authoritative and Persuasive. Our approach fine-tunes the multilingual XLMRoBERTa base model with a piecewise segment comparison strategy that evaluates distinct stylistic markers across sentence segments,enabling richer contextual discrimination between visually similar styles. Evaluated on the official test set, our system achieves a Macro F1score of 0.1205, Accuracy of 0.1196, Precision of 0.1205 and Recall of 0.1231. We analyze the challenges of stylistic ambiguity in low resource Telugu NLP and discuss directions for future improvement.

Co-authors

Venues

Fix author