Shivansh Subramanian


2024

pdf
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
Harshit Gupta | Manav Chaudhary | Shivansh Subramanian | Tathagata Raha | Vasudeva Varma
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models’ lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default commonsense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM’s reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

2022

pdf
Towards Capturing Changes in Mood and Identifying Suicidality Risk
Sravani Boinepelli | Shivansh Subramanian | Abhijeeth Singam | Tathagata Raha | Vasudeva Varma
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

This paper describes our systems for CLPsych?s 2022 Shared Task. Subtask A involves capturing moments of change in an individual?s mood over time, while Subtask B asked us to identify the suicidality risk of a user. We explore multiple machine learning and deep learning methods for the same, taking real-life applicability into account while considering the design of the architecture. Our team achieved top results in different categories for both subtasks. Task A was evaluated on a post-level (using macro averaged F1) and on a window-based timeline level (using macro-averaged precision and recall). We scored a post-level F1 of 0.520 and ranked second with a timeline-level recall of 0.646. Task B was a user-level task where we also came in second with a micro F1 of 0.520 and scored third place on the leaderboard with a macro F1 of 0.380.