Palash Nandi


2026

The global deployment of Large Language Models (LLMs) underscores the urgent need to evaluate their cultural alignment. However, assessing genuine "cultural awareness" across modalities (text, vision, speech) and languages remains a significant challenge. To comprehensively investigate this domain, we propose MMAC, a systematic framework that encompasses a tri-modally aligned cultural benchmark creation pipeline and a five-dimensional evaluation protocol to assess cross-country awareness disparities, evaluate cross-lingual and cross-modal consistency, and verify cultural knowledge generalization and grounding validity. Given the prevailing Western cultural bias in current models, we focus on 8 Asian countries as our dataset foundation to more acutely reveal potential cultural deficiencies in LLMs. Our dataset, MMAC-bench, features 27,000 human-curated questions across 10 languages. Crucially, it is the first dataset aligned at the input level across text, image, and speech, enabling direct cross-modal transfer tests. Each question consists of multiple-choice options accompanied by open-ended generated explanations, where 79% require multi-step reasoning grounded in cultural context, moving beyond simple memorization. We probe the causes of modal divergence, offering insights into fostering culturally robust MLLMs.

2025

Large Language Models (LLMs) with safe-alignment training are powerful instruments with robust language comprehension capability. Typically LLMs undergo careful alignment training involving human feedback to ensure the acceptance of safe inputs while rejection of harmful or unsafe ones. However, these humongous models are still vulnerable to jailbreak attacks, in which malicious users attempt to generate harmful outputs that safety-aligned LLMs are trained to avoid. In this study, we find that the safety mechanisms in LLMs are predominantly prevalent in the middle-to-late layers. Based on this observation, we introduce a novel white-box jailbreak method SABER (Safety Alignment Bypass via Extra Residuals) that connects two intermediate layer s and e such that s<e with a residual connection, achieving an improvement of 51% over the best performing baseline GCG on HarmBench test set. Moreover, model demonstrates only a marginal shift in perplexity when evaluated on the validation set of HarmBench.

2024

Moderating hate speech (HS) in the evolving online landscape is a complex challenge, compounded by the multimodal nature of digital content. This survey examines recent advancements in HS moderation, focusing on the burgeoning role of large language models (LLMs) and large multimodal models (LMMs) in detecting, explaining, debiasing, and countering HS. We begin with a comprehensive analysis of current literature, uncovering how text, images, and audio interact to spread HS. The combination of these modalities adds complexity and subtlety to HS dissemination. We also identified research gaps, particularly in underrepresented languages and cultures, and highlight the need for solutions in low-resource settings. The survey concludes with future research directions, including novel AI methodologies, ethical AI governance, and the development of context-aware systems. This overview aims to inspire further research and foster collaboration towards responsible and human-centric approaches to HS moderation in the digital age.