Abhin B


2025

pdf bib
Transformer-Based Analysis of Adaptive and Maladaptive Self-States in Longitudinal Social Media Data
Abhin B | Renukasakshi V Patil
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)

The CLPsych workshop, held annually since 2014, promotes the application of computational linguistics to behavioral analysis and neurological health assessment. The CLPsych 2025 shared task, extending the framework of the 2022 iteration, leverages the MIND framework to model temporal fluctuations in mental states. This shared task comprises three sub-tasks, each presenting substantial challenges to natural language processing (NLP) systems, requiring sensitive and precise outcomes in analyzing adaptive and maladaptive behaviors. In this study, we employed a range of modeling strategies tailored to the requirements and expected outputs of each subtask. Our approach mostly utilized traditional language models like BERT, LongFormer and Pegasus diverging from the prevalent trend of prompt-tuned large language models. We achieved an overall ranking of 13th, with subtask rankings of 8th in Task 1a, 13th in Task 1b, 8th in Task 2, and 7th in Task 3. These results highlight the efficacy of our methods while underscoring areas for further refinement in handling complex behavioral data.

2024

pdf bib
SCaLAR at SemEval-2024 Task 8: Unmasking the machine : Exploring the power of RoBERTa Ensemble for Detecting Machine Generated Text
Anand Kumar | Abhin B | Sidhaarth Murali
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

SemEval SubtaskB, a shared task that is concerned with the detection of text generated by one out of the 5 different models - davinci, bloomz, chatGPT, cohere and dolly. This is an important task considering the boom of generative models in the current day scenario and their ability to draft mails, formal documents, write and qualify exams and many more which keep evolving every passing day. The purpose of classifying text as generated by which pre-trained model helps in analyzing how each of the training data has affected the ability of the model in performing a certain given task. In the proposed approach, data augmentation was done in order to handle lengthier sentences and also labelling them with the same parent label. Upon the augmented data three RoBERTa models were trained on different segments of data which were then ensembled using a voting classifier based on their R2 score to achieve a higher accuracy than the individual models itself. The proposed model achieved an overall validation accuracy of 97.05% and testing accuracy of 76.25%. and our standing was 18th position on the leaderboard.