Harshit Gupta

2025

pdf bib abs
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha | Saurabh Kumar Pandey | Harshit Gupta | Monojit Choudhury
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In a rapidly globalizing and digital world, content such as book and product reviews created by people from diverse cultures are read and consumed by others from different corners of the world. In this paper, we investigate the extent and patterns of gaps in understandability of book reviews due to the presence of culturally-specific items and elements that might be alien to users from another culture. Our user-study on 57 book reviews from Goodreads reveal that 83% of the reviews had at least one culture-specific difficult-to-understand element. We also evaluate the efficacy of GPT-4o in identifying such items, given the cultural background of the reader; the results are mixed, implying a significant scope for improvement. Our datasets are available here: https://github.com/sougata-ub/reading_between_lines.

2024

pdf bib abs
Towards Understanding the Robustness of LLM-based Evaluations under Perturbations
Manav Chaudhary | Harshit Gupta | Savita Bhat | Vasudeva Varma
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Traditional evaluation metrics like BLEU and ROUGE fall short when capturing the nuanced qualities of generated text, particularly when there is no single ground truth. In this paper, we explore the potential of Large Language Models (LLMs), specifically Google Gemini 1, to serve as automatic evaluators for non-standardized metrics in summarization and dialog-based tasks. We conduct experiments across multiple prompting strategies to examine how LLMs fare as quality annotators when compared with human judgments on the SummEval and USR datasets, asking the model to generate both a score as well as a justification for the score. Furthermore, we explore the robustness of the LLM evaluator by using perturbed inputs. Our findings suggest that while LLMs show promise, their alignment with human evaluators is limited, they are not robust against perturbations and significant improvements are required for their standalone use as reliable evaluators for subjective metrics.

pdf bib abs
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
Harshit Gupta | Manav Chaudhary | Shivansh Subramanian | Tathagata Raha | Vasudeva Varma
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models’ lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default commonsense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM’s reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

pdf bib abs
BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets
Manav Chaudhary | Harshit Gupta | Vasudeva Varma
Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iREL’s approach to the #SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.

Harshit Gupta

Fixing paper assignments

2025

2024

2023

Co-authors

Venues