Arka Dutta

2026

Large Language Models (LLMs) can exhibit imbalanced biases against vulnerable groups, but how they rationalize stereotypes and rights restrictions targeting mental health entities remains underexplored. We audit a broad suite of open-weight LLMs on stereotype-justification prompts tied to mental health identities. We find that several widely used models endorse harmful stereotypes when explicitly asked to justify them, with endorsement varying across model families, versions, and mental health conditions. Finally, we show that widely used harmful-content evaluation and moderation frameworks often miss these nuanced, discriminatory responses, highlighting a gap in current AI safety evaluation for mental health groups.

pdf bib abs

CUET-2567@DravidianLangTech-ACL 2026: Multimodal Stance and Target Identification in Dravidian Political Memes
Arka Dutta | Anindya Majumder | Adnan Faisal | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

In Dravidian languages, political memes progressively shape public opinion and political discourse, influencing digital conversations andpublic narratives. Our paper proposes a multilevel multimodal framework for political meme classification in Tamil and Malayalam as part of the Multi Level Political Meme ClassificationDravidianLangTech@ACL 2026 shared task. The task has involved two levels: Level 1 has identified whether a meme expresses Troll/Oppose or Support/Praise, while Level 2 has determined the specific target category (Individual, Party, or Intersection). We have evaluated unimodal and multimodal architectures to analyze the impact of textual and visual representation. Experimental results have highlighted the importance of a multimodal approach over unimodal approaches. This workconfirms the effectiveness of combining image and text features in meme understanding. Among the evaluated models, the mBERT+ViTarchitecture has achieved the best overall performance across both languages and classification levels. According to the evaluation of shared task we achieved average F1 score of 0.72 securing the 2nd rank in Malayalam task and F1 score of 0.76 in Tamil task securing the 6th rank. However after our experimental evaluation we got best average F1 score of 0.62 for Tamil and 0.49 for Malayalam. Despite moderate results, challenges have remained mainly due to the dataset size, class imbalance, and noisy text extraction from images.

pdf bib abs

What About the Scene With the Hitler Reference? HAUNT: A Framework to Probe LLMs’ Self-consistency in Closed Domains Via Adversarial Nudge
Arka Dutta | Sujan Dutta | Rijul Magu | Soumyajit Datta | Munmun De Choudhury | Ashiqur R. KhudaBukhsh
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. First, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in question. Next, we instruct the LLM to verify the same set of assertions as truths and lies consistent with the same closed domain. Finally, we test the robustness of the LLM against the lies generated (and verified) by itself. Our extensive evaluation, conducted using five widely known proprietary and six open LLMs across two closed domains of popular movies and novels, reveals a wide range of susceptibility to adversarial nudges: even among the strongest proprietary LLMs, Claude exhibits strong resilience, GPT and Grok demonstrate moderate resilience, while Gemini and DeepSeek show weak resilience and open models fall short significantly.

Co-authors

Venues

Fix author