Ishaan Shukla

2025

pdf bib abs
Know Thyself: Validating Knowledge Awareness of LLM-based Persona Agents
Savita Bhat | Ishaan Shukla | Shirish Karande
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)

Large Language Models (LLMs) have demonstrated remarkable capability in simulating human behaviors, personality, and language. Such synthetic agents with personalities are considered as cost-effective proxies for real users to facilitate crowd-sourcing efforts like annotations, surveys, and A/B testing. Accordingly, it is imperative to validate knowledge awareness of these LLM persona agents when they are customized for further usage. Currently, there is no established way for such evaluation and appropriate mitigation. In this work, we propose a generic evaluation approach to validate LLM based persona agents for correctness, relevance, and diversity in the context of self-awareness and domain knowledge.We evaluate the efficacy of this framework using three LLMs ( Llama, GPT-4o, and Gemma) for domains such as air travel, gaming, and fitness. We also experiment with advanced prompting strategies such as ReAct and Reflexion. We find that though GPT-4o and Llama demonstrate comparable performance, they fail some of basic consistency checks under certain perturbations.

2024

pdf bib abs
PICT at StanceEval2024: Stance Detection in Arabic using Ensemble of Large Language Models
Ishaan Shukla | Ankit Vaidya | Geetanjali Kale
Proceedings of the Second Arabic Natural Language Processing Conference

This paper outlines our approach to the StanceEval 2024- Arabic Stance Evaluation shared task. The goal of the task was to identify the stance, one out of three (Favor, Against or None) towards tweets based on three topics, namely- COVID-19 Vaccine, Digital Transformation and Women Empowerment. Our approach consists of fine-tuning BERT-based models efficiently for both, Single-Task Learning as well as Multi-Task Learning, the details of which are discussed. Finally, an ensemble was implemented on the best-performing models to maximize overall performance. We achieved a macro F1 score of 78.02% in this shared task. Our codebase is available publicly.

pdf bib abs
CLTeam1 at SemEval-2024 Task 10: Large Language Model based ensemble for Emotion Detection in Hinglish
Ankit Vaidya | Aditya Gokhale | Arnav Desai | Ishaan Shukla | Sheetal Sonawane
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper outlines our approach for the ERC subtask of the SemEval 2024 EdiREF Shared Task. In this sub-task, an emotion had to be assigned to an utterance which was the part of a dialogue. The utterance had to be classified into one of the following classes- disgust, contempt, anger, neutral, joy, sadness, fear, surprise. Our proposed system makes use of an ensemble of language specific RoBERTA and BERT models to tackle the problem. A weighted F1-score of 44% was achieved by our system in this task. We conducted comprehensive ablations and suggested directions of future work. Our codebase is available publicly.

Co-authors

Shirish Karande 1

Sheetal Sonawane 1

Venues

Fix data