Saurabh Kumar Pandey


2025

pdf bib
CULTURALLY YOURS: A Reading Assistant for Cross-Cultural Content
Saurabh Kumar Pandey | Harshit Budhiraja | Sougata Saha | Monojit Choudhury
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations

Users from diverse cultural backgrounds frequently face challenges in understanding content from various online sources that are written by people from a different culture. This paper presents CULTURALLY YOURS (CY), a first-of-its-kind cultural reading assistant tool designed to identify culture-specific items (CSIs) for users from varying cultural contexts. By leveraging principles of relevance feedback and using culture as a prior, our tool personalizes to the user’s preferences based on the interaction of the user with the tool. CY can be powered by any LLM that can reason with cultural background of the user and the input text in English, provided as a part of the prompt that are iteratively refined as the user keeps interacting with the system. In this demo, we use GPT-4o as the back-end. We conduct a user study across 13 users from 8 different geographies. The results demonstrate CY’s effectiveness in enhancing user engagement and personalization alongside comprehension of cross-cultural content.

pdf bib
Meta-Cultural Competence: Climbing the Right Hill of Cultural Awareness
Sougata Saha | Saurabh Kumar Pandey | Monojit Choudhury
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Numerous recent studies have shown that Large Language Models (LLMs) are biased towards a Western and Anglo-centric worldview, which compromises their usefulness in non-Western cultural settings. However, “culture” is a complex, multifaceted topic, and its awareness, representation, and modeling in LLMs and LLM-based applications can be defined and measured in numerous ways. In this position paper, we ask what does it mean for an LLM to possess “cultural awareness”, and through a thought experiment, which is an extension of the Octopus test proposed by Bender and Koller (2020), we argue that it is not cultural awareness or knowledge, rather meta-cultural competence, which is required of an LLM and LLM-based AI system that will make it useful across various, including completely unseen, cultures. We lay out the principles of meta-cultural competence AI systems, and discuss ways to measure and model those.

pdf bib
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha | Saurabh Kumar Pandey | Harshit Gupta | Monojit Choudhury
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In a rapidly globalizing and digital world, content such as book and product reviews created by people from diverse cultures are read and consumed by others from different corners of the world. In this paper, we investigate the extent and patterns of gaps in understandability of book reviews due to the presence of culturally-specific items and elements that might be alien to users from another culture. Our user-study on 57 book reviews from Goodreads reveal that 83% of the reviews had at least one culture-specific difficult-to-understand element. We also evaluate the efficacy of GPT-4o in identifying such items, given the cultural background of the reader; the results are mixed, implying a significant scope for improvement. Our datasets are available here: https://github.com/sougata-ub/reading_between_lines.

pdf bib
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Saurabh Kumar Pandey | Sachin Vashistha | Debrup Das | Somak Aditya | Monojit Choudhury
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.

2024

pdf bib
Evaluating ChatGPT against Functionality Tests for Hate Speech Detection
Mithun Das | Saurabh Kumar Pandey | Animesh Mukherjee
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection. However, it is crucial to comprehend the limitations of these models to build robust hate speech detection systems. To bridge this gap, our study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages. Our evaluation employs a series of functionality tests that reveals various intricate failures of the model which the aggregate metrics like macro F1 or accuracy are not able to unfold. In addition, we investigate the influence of complex emotions, such as the use of emojis in hate speech, on the performance of the ChatGPT model. Our analysis highlights the shortcomings of the generative models in detecting certain types of hate speech and highlighting the need for further research and improvements in the workings of these models.