2025
pdf
bib
abs
Navigating the Cultural Kaleidoscope: A Hitchhiker’s Guide to Sensitivity in Large Language Models
Somnath Banerjee
|
Sayan Layek
|
Hari Shrawgi
|
Rajarshi Mandal
|
Avik Halder
|
Shanu Kumar
|
Sagnik Basu
|
Parag Agrawal
|
Rima Hazra
|
Animesh Mukherjee
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Cultural harm stems in LLMs whereby these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content.
pdf
bib
abs
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Somnath Banerjee
|
Avik Halder
|
Rajarshi Mandal
|
Sayan Layek
|
Ian Soboroff
|
Rima Hazra
|
Animesh Mukherjee
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Pretrained language models (PLMs) have revolutionized NLP but amplify linguistic inequities in multilingual applications. While prior studies focused on transformer architectures such as BERT, we evaluate large language models (LLMs) including Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama. Through rigorous testing across eight languages spanning high-resource (English, German, French, Italian, Spanish) and low-resource (Hindi, Tamil, Kannada) settings, we reveal systemic failures in preserving multilingual reliability and adaptability. Using paradigms like each language for itself’ (ELFI) and each language for others’ (ELFO), we highlight the inability of current LLMs to bridge linguistic divides. Even model merging fail to mitigate these gaps, exposing fundamental limitations. These findings emphasize the critical need for reimagining AI architectures to deliver true linguistic inclusivity and equitable performance across diverse languages.
2024
pdf
bib
abs
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
Rima Hazra
|
Sayan Layek
|
Somnath Banerjee
|
Soujanya Poria
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Ensuring the safe alignment of large language models (LLMs) with human values is critical as they become integral to applications like translation and question answering. Current alignment methods struggle with dynamic user intentions and complex objectives, making models vulnerable to generating harmful content. We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal to avoid harmful content and Safety Alignment to promote safe responses. Additionally, we present NoIntentEdit, a dataset highlighting edit instances that could compromise model safety if used unintentionally. Our experiments show that Safety Arithmetic significantly improves safety measures, reduces over-safety, and maintains model utility, outperforming existing methods in ensuring safe content generation.
pdf
bib
abs
Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context
Somnath Banerjee
|
Amruit Sahoo
|
Sayan Layek
|
Avik Dutta
|
Rima Hazra
|
Animesh Mukherjee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement, honing the proficiency of LLMs, especially in domain specific community question answering platforms like AskUbuntu, Unix, and ServerFault. We conduct experiments on various LLMs with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions. Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases. This advancement highlights the importance of pairing context rich data retrieval with LLMs, offering a renewed approach to knowledge sourcing and generation in AI systems. We also show that, due to rich contextual data retrieval, the crucial entities, along with the generated answer, remain factually coherent with the gold answer. We shall release the source code and datasets upon acceptance.
pdf
bib
abs
Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models
Rima Hazra
|
Sayan Layek
|
Somnath Banerjee
|
Soujanya Poria
Findings of the Association for Computational Linguistics: ACL 2024
In the rapidly advancing field of artificial intelligence, the concept of ‘Red-Teaming’ or ‘Jailbreaking’ large language models (LLMs) has emerged as a crucial area of study. This approach is especially significant in terms of assessing and enhancing the safety and robustness of these models. This paper investigates the intricate consequences of such modifications through model editing, uncovering a complex relationship between enhancing model accuracy and preserving its ethical integrity. Our in-depth analysis reveals a striking paradox: while injecting accurate information is crucial for model reliability, it can paradoxically destabilize the model’s foundational framework, resulting in unpredictable and potentially unsafe behaviors. Additionally, we propose a benchmark dataset NicheHazardQA to investigate this unsafe behavior both within the same and cross topical domain. This aspect of our research sheds light on how the edits, impact the model’s safety metrics and guardrails. Our findings show that model editing serves as a cost-effective tool for topical red-teaming by methodically applying targeted edits and evaluating the resultant model behavior.