Vijayalakshmi P
2026
MedHastra at SemEval-2026 Task 13: Stylometric Ensembles and Transformer Fine-Tuning for Robust AI Code Detection, Attribution, and Adversarial Analysis
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes Team MedHastra’s submission to SemEval-2026 Task 13 on detecting machine-generated code across diverse programming languages, generators, and application scenarios. We participated in all three subtasks: (A) binary detection of AI-generated code under out-of-distribution conditions, (B) multi-class attribution across ten large language model families, and (C) classification of human, fully AI-generated, hybrid, and adversarial code.For Subtask A, we implemented a stylometric ensemble combining structural formatting features with word- and character-level TF-IDF representations, trained using Random Forest, Gradient Boosting, and Logistic Regression with soft voting. For Subtasks B and C, we fine-tuned CodeBERT to leverage contextual code representations, incorporating class balancing strategies such as downsampling and weighted cross-entropy.Our results demonstrate that handcrafted stylometric features struggle under strong distribution shift, while transformer-based contextual modeling is more effective for fine-grained attribution and hybrid/adversarial detection. The study highlights the importance of robust contextual representations for realistic AI-assisted programming scenarios.
MedHastra@DravidianLangTech 2026: Piecewise Style Classification for Telugu Prompt Recovery Using XLM-RoBERTa
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Shruti Chandrasekar | Vedajanaani R S | Vijayalakshmi P
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
We present a system for the DravidianLangTech @ ACL 2026 shared task on TeluguPrompt-Style Recovery(B et al., 2026). The task requires classifying Telugu text into one of nine communicative styles: Formal, Informal, Optimistic, Pessimistic, Humorous, Serious, Inspiring, Authoritative and Persuasive. Our approach fine-tunes the multilingual XLMRoBERTa base model with a piecewise segment comparison strategy that evaluates distinct stylistic markers across sentence segments,enabling richer contextual discrimination between visually similar styles. Evaluated on the official test set, our system achieves a Macro F1score of 0.1205, Accuracy of 0.1196, Precision of 0.1205 and Recall of 0.1231. We analyze the challenges of stylistic ambiguity in low resource Telugu NLP and discuss directions for future improvement.
2024
Severity Classification and Dysarthric Speech Detection using Self-Supervised Representations
B Sanjay | Priyadharshini M.K | Vijayalakshmi P | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
B Sanjay | Priyadharshini M.K | Vijayalakshmi P | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Automatic detection and classification of dysarthria severity from speech provides a non-invasive and efficient diagnostic tool, offering clinicians valuable insights to guide treatment and therapy decisions. Our study evaluated two pre-trained models—wav2vec2-BASE and distilALHuBERT, for feature extraction to build speech detection and severity-level classification systems for dysarthric speech. We conducted experiments on the TDSC dataset using two approaches: a machine learning model (support vector machine, SVM) and a deep learning model (convolutional neural network, CNN). Our findings showed that features derived from distilALHuBERT significantly outperformed those from wav2vec2-BASE in both dysarthric speech detection and severity classification tasks. Notably, the distilALHuBERT features achieved 99% accuracy in automatic detection and 95% accuracy in severity classification, surpassing the performance of wav2vec2 features.
Pronunciation scoring for dysarthric speakers with DNN-HMM based goodness of pronunciation (GoP) measure
Shruti Jeyaraman | Anantha K. Krishnan | Vijayalakshmi P | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Shruti Jeyaraman | Anantha K. Krishnan | Vijayalakshmi P | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Dysarthria is a neurological motor disorder caused by cranial damage that interferes with the muscles involved in the correct pronunciation of sounds and intelligible speech. Computer Aided Pronunciation training (CAPT) systems traditionally used for the pronunciation assessment of L2 language learners can offer a method to detect and score mispronounced sounds in dysarthric speakers as a way of evaluation without human intervention. In this work, a phonetic level DNN-HMM based Goodness of Pronunciation (GoP) for pronunciation scoring, on native Tamil Dysarthric speakers corpus is presented. The scores are calculated using the posteriors of the subphonemic elements called senones with a focus on their prevalence across phones and their transitions across HMM states. The phonetic-level scores obtained for speakers of different levels of severity help establish speaker-specific trends in pronunciation through an objective log-likelihood metric, in contrast to subjective evaluations by Speech Language Therapists (SLTs).
DesiPayanam: developing an Indic travel partner
Diviya K N | Mrinalini K | Vijayalakshmi P | Thenmozhi J | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Diviya K N | Mrinalini K | Vijayalakshmi P | Thenmozhi J | Nagarajan T
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Domain-specific machine translation (MT) systems are essential in bridging the communication gap between people across different businesses, economies, and countries. India, a linguistically rich country with a booming tourism industry is a perfect market for such an MT system. On this note, the current work aims to develop a domain-specific transformer-based MT system for Hindi-to-Tamil translation. In the current work, neural-based MT (NMT) model is trained from scratch and the hyper-parameters of the model architecture are modified to analyze its effect on the translation performance. Further, a finetuning approach is adopted to finetune a pretrained transformer MT model to better suit the tourism domain. The proposed experiments are observed to improve the BLEU scores of the translation system by a maximum of 1% and 4% for the training from scratch and finetuned systems respectively.