Chirag Agarwal
2026
A Mechanistic Perspective and Difficulty Metric for Unlearning
Jiali Cheng | Ziheng Chen | Chirag Agarwal | Hadi Amiri
Findings of the Association for Computational Linguistics: ACL 2026
Jiali Cheng | Ziheng Chen | Chirag Agarwal | Hadi Amiri
Findings of the Association for Computational Linguistics: ACL 2026
A Graph Talks, But Who’s Listening? Rethinking Evaluations for Graph-Language Models
Soham Petkar | Hari Aakash K | Anirudh Vempati | Akshit Sinha | Ponnurangam Kumaraguru | Chirag Agarwal
Findings of the Association for Computational Linguistics: ACL 2026
Soham Petkar | Hari Aakash K | Anirudh Vempati | Akshit Sinha | Ponnurangam Kumaraguru | Chirag Agarwal
Findings of the Association for Computational Linguistics: ACL 2026
Recent research has extensively explored the graph-reasoning capabilities of Large Language Models (LLMs) through textual descriptions. However, benchmarks specifically designed for Graph-Language Models (GLMs), which integrate Graph Neural Networks (GNNs) with LLMs, remain significantly underdeveloped. In this work, we first demonstrate that existing GLM evaluations, largely repurposed from unimodal node and edge level tasks, fail to assess true multimodal integration. Our analysis reveals that strong performance on these benchmarks is achievable using textual or structural features in isolation, bypassing the need for joint reasoning. To bridge this gap, we introduce CLEGR (Compositional Language-Graph Reasoning), a benchmark explicitly designed to evaluate multimodal reasoning over graph topology and textual semantics. Evaluation of representative GLMs on CLEGR shows that they exhibit significant performance degradation on CLEGR tasks and unimodal soft-prompted LLMs perform on par with complex multimodal GLMs. These findings collectively highlight limitations in the graph reasoning capabilities of existing GLMs and provide a foundation for advancing the community toward explicit multimodal reasoning involving graph structure and language.
Towards Understanding the Robustness of Sparse Autoencoders
Ahson Saiyed | Sabrina Sadiekh | Chirag Agarwal
Findings of the Association for Computational Linguistics: ACL 2026
Ahson Saiyed | Sabrina Sadiekh | Chirag Agarwal
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) remain vulnerable to optimization-based jailbreak attacks that exploit internal gradient structure. While Sparse Autoencoders (SAEs) are widely used for interpretability, their robustness implications remain underexplored. We present a study of integrating pretrained SAEs into transformer residual streams at inference time, without modifying model weights or blocking gradients. Across four model families (Gemma, LLaMA, Mistral, Qwen) and two strong white-box attacks (GCG, BEAST) plus three black-box benchmarks, SAE-augmented models achieve up to a 5x reduction in jailbreak success rate relative to the undefended baseline and reduce cross-model attack transferability. Parametric ablations reveal (i) a monotonic dose-response relationship between L0 sparsity and attack success rate, and (ii) a layer-dependent defense-utility tradeoff, where intermediate layers balance robustness and clean performance. These findings are consistent with a representational bottleneck hypothesis: sparse projection reshapes the optimization geometry exploited by jailbreak attacks.
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning
Eric Onyame | Akash Ghosh | Subhadip Baidya | Sriparna Saha | Xiuying Chen | Chirag Agarwal
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eric Onyame | Akash Ghosh | Subhadip Baidya | Sriparna Saha | Xiuying Chen | Chirag Agarwal
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While large language models (LLMs) have shown to perform well on monolingual mathematical and commonsense reasoning, they remain unreliable for multilingual medical reasoning applications, hindering their deployment in multilingual healthcare settings. We address this by first introducing CURE-Med-Bench, a high-quality multilingual medical reasoning dataset with open-ended reasoning queries with a single verifiable answer, spanning thirteen languages, including underrepresented languages such as Amharic, Yoruba, and Swahili. Building on this dataset, we propose CURE-Med, a curriculum-informed reinforcement learning framework that integrates code-switching-aware supervised fine-tuning and Group Relative Policy Optimization to jointly improve logical correctness and language stability. Across thirteen languages, our approach consistently outperforms strong baselines and scales effectively, achieving 85.21% language consistency and 54.35% logical correctness at 7B parameters and 94.96% language consistency and 70.04% logical correctness at 32B parameters. These results support reliable and equitable multilingual medical reasoning in LLMs. The code and dataset will be made publicly available upon acceptance.
2025
HALLUCINOGEN: Benchmarking Hallucination in Implicit Reasoning within Large Vision Language Models
Ashish Seth | Dinesh Manocha | Chirag Agarwal
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Ashish Seth | Dinesh Manocha | Chirag Agarwal
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks. However, these models still suffer from hallucinations, particularly when required to implicitly recognize or infer diverse visual entities from images for complex vision-language tasks. To address this challenge, we propose HALLUCINOGEN, a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks to evaluate the extent of hallucination in state-of-the-art LVLMs. Our benchmark provides a comprehensive study of the implicit reasoning capabilities of these models by first categorizing visual entities based on the ease of recognition in an image as either salient (prominent, visibly recognizable objects such as a car) or latent entities (such as identifying a disease from a chest X-ray), which are not readily visible and require domain knowledge or contextual reasoning for accurate inference. Next, we design hallucination attacks for both types of entities to assess hallucinations in LVLMs while performing various vision-language tasks, such as locating or reasoning about specific entities within an image, where models must perform implicit reasoning by verifying the existence of the queried entity within the image before generating responses. Finally, our extensive evaluations of eleven LVLMs, including powerful open-source models (like LLaMA-3.2 and DeepSeek-V2), commercial models like Gemini, and two hallucination mitigation strategies across multiple datasets, demonstrate that current LVLMs remain susceptible to hallucination attacks.
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
Elita Lobo | Chirag Agarwal | Himabindu Lakkaraju
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Elita Lobo | Chirag Agarwal | Himabindu Lakkaraju
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large language models have emerged as powerful tools for general intelligence, showcasing advanced natural language processing capabilities that find applications across diverse domains. Despite their impressive performance, recent studies have highlighted the potential for significant enhancements in LLMs’ task-specific performance through fine-tuning strategies like Reinforcement Learning with Human Feedback (RLHF), supervised fine-tuning (SFT), and Quantized Low-Rank Adapters (Q-LoRA) method. However, previous works have shown that while fine-tuning offers significant performance gains, it also leads to challenges such as catastrophic forgetting and privacy and safety risks. To this end, there has been little to no work in *understanding the impact of fine-tuning on the reasoning capabilities of LLMs*. Our research investigates the effect of fine-tuning on the reasoning abilities of LLMs, addressing critical questions regarding the impact of task-specific fine-tuning on overall reasoning capabilities, the influence of fine-tuning on Chain-of-Thought (CoT) reasoning performance, and the implications for the faithfulness of CoT reasonings. By exploring these dimensions, our study shows the impact of fine-tuning on LLM reasoning capabilities, where the faithfulness of CoT reasoning, on average across four datasets, decreases, highlighting potential shifts in internal mechanisms of the LLMs resulting from fine-tuning processes.
Analyzing Memorization in Large Language Models through the Lens of Model Attribution
Tarun Ram Menta | Susmit Agrawal | Chirag Agarwal
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Tarun Ram Menta | Susmit Agrawal | Chirag Agarwal
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on post-hoc analyses—such as extracting memorized content or developing memorization metrics—without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM’s architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with and without our attributions. Our theoretical and empirical analyses reveal that attention modules in deeper transformer blocks are primarily responsible for memorization, whereas earlier blocks are crucial for the model’s generalization and reasoning capabilities. We validate our findings through comprehensive experiments on different LLM families (Pythia and GPT-Neo) and five benchmark datasets. Our insights offer a practical approach to mitigate memorization in LLMs while preserving their performance, contributing to safer and more ethical deployment in real-world applications.
Towards Operationalizing Right to Data Protection
Abhinav Java | Simra Shahid | Chirag Agarwal
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Abhinav Java | Simra Shahid | Chirag Agarwal
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
The widespread practice of indiscriminate data scraping to fine-tune language models (LMs) raises significant legal and ethical concerns, particularly regarding compliance with data protection laws such as the General Data Protection Regulation (GDPR). This practice often results in the unauthorized use of personal information, prompting growing debate within the academic and regulatory communities. Recent works have introduced the concept of generating unlearnable datasets (by adding imperceptible noise to the clean data), such that the underlying model achieves lower loss during training but fails to generalize to the unseen test setting. Though somewhat effective, these approaches are predominantly designed for images and are limited by several practical constraints like requiring knowledge of the target model. To this end, we introduce **RegText**, a framework that injects imperceptible spurious correlations into natural language datasets, effectively rendering them unlearnable without affecting semantic content. We demonstrate RegText’s utility through rigorous empirical analysis of small and large LMs. Notably, RegText can restrict newer models like GPT-4o and Llama from learning on our generated data, resulting in a drop in their test accuracy compared to their zero-shot performance and paving the way for generating unlearnable text to protect public data.
A Survey of Multilingual Reasoning in Language Models
Akash Ghosh | Debayan Datta | Sriparna Saha | Chirag Agarwal
Findings of the Association for Computational Linguistics: EMNLP 2025
Akash Ghosh | Debayan Datta | Sriparna Saha | Chirag Agarwal
Findings of the Association for Computational Linguistics: EMNLP 2025
While reasoning and multilingual capabilities in Language Models (LMs) have achieved remarkable progress in recent years, their integration into a unified paradigm—multilingual reasoning—is at a nascent stage. Multilingual reasoning requires language models to handle logical reasoning across languages while addressing misalignment, biases, and challenges in low-resource settings. This survey provides the first in-depth review of multilingual reasoning in LMs. In this survey, we provide a systematic overview of existing methods that leverage LMs for multilingual reasoning, specifically outlining the challenges, motivations, and foundational aspects of applying language models to reason across diverse languages. We provide an overview of the standard data resources used for training multilingual reasoning in LMs and the evaluation benchmarks employed to assess their multilingual capabilities. Next, we analyze various state-of-the-art methods and their performance on these benchmarks. Finally, we explore future research opportunities to improve multilingual reasoning in LMs, focusing on enhancing their ability to handle diverse languages and complex reasoning tasks.
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
Ashish Seth | Utkarsh Tyagi | Ramaneswaran Selvakumar | Nishit Anand | Sonal Kumar | Sreyan Ghosh | Ramani Duraiswami | Chirag Agarwal | Dinesh Manocha
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ashish Seth | Utkarsh Tyagi | Ramaneswaran Selvakumar | Nishit Anand | Sonal Kumar | Sreyan Ghosh | Ramani Duraiswami | Chirag Agarwal | Dinesh Manocha
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in complex multimodal tasks. While MLLMs excel at visual perception and reasoning in third-person and egocentric videos, they are prone to hallucinations, generating coherent yet inaccurate responses. We present EGOILLUSION, a first benchmark to evaluate MLLM hallucinations in egocentric videos. EGOILLUSION comprises 1,400 videos paired with 8,000 human-annotated open and closed-ended questions designed to trigger hallucinations in both visual and auditory cues in egocentric videos. Evaluations across ten MLLMs reveal significant challenges, including powerful models like GPT-4o and Gemini, achieving only 59% accuracy. EGOILLUSION lays the foundation in developing robust benchmarks to evaluate the effectiveness of MLLMs and spurs the development of better egocentric MLLMs with reduced hallucination rates. Our benchmark will be open-sourced for reproducibility
Search
Fix author
Co-authors
- Akash Ghosh 2
- Dinesh Manocha 2
- Sriparna Saha 2
- Ashish Seth 2
- Susmit Agrawal 1
- Hadi Amiri 1
- Nishit Anand 1
- Subhadip Baidya 1
- Xiuying Chen 1
- Ziheng Chen 1
- Jiali Cheng 1
- Debayan Datta 1
- Ramani Duraiswami 1
- Sreyan Ghosh 1
- Abhinav Java 1
- Hari Aakash K 1
- Sonal Kumar 1
- Ponnurangam Kumaraguru 1
- Himabindu Lakkaraju 1
- Elita Lobo 1
- Tarun Ram Menta 1
- Eric Onyame 1
- Soham Petkar 1
- Sabrina Sadiekh 1
- Ahson Saiyed 1
- Ramaneswaran Selvakumar 1
- Simra Shahid 1
- Akshit Sinha 1
- Utkarsh Tyagi 1
- Anirudh Vempati 1