Janani Hariharakrishnan


2026

This paper describes our system for SemEval-2026 Task 13, Subtask A: detecting whether a given code snippet is AI-generated or human-written. We explored a range of approaches from classical machine learning baselines using TF-IDF representations to fine-tuned transformer models pre-trained on code, specifically CodeBERT and GraphCodeBERT. Our experiments revealed a notable degradation in model performance when CodeBERT was trained beyond an optimal number of steps, indicating that continued training within an epoch leads to overfitting or representation drift. GraphCodeBERT, by contrast, yielded our best submission with a macro F1 score of 0.36866. Our findings highlight the sensitivity of code-specific transformers to training duration and suggest that early checkpoint selection is critical for this task.

2025

Emotion recognition in textual data is a crucial NLP task with applications in sentiment analysis and mental health monitoring. SemEval 2025 Task 11 introduces a multilingual dataset spanning 28 languages, including low-resource ones, to improve cross-lingual emotion detection. Our approach utilizes T5 for English and mT5 for other languages, fine-tuning them for multi-label classification and emotion intensity estimation. Our findings demonstrate the effectiveness of transformer-based models in capturing nuanced emotional expressions across diverse languages.