Advit Deepak

2025

pdf bib abs
Identifying Unlearned Data in LLMs via Membership Inference Attacks
Advit Deepak | Megan Mou | Jing Huang | Diyi Yang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Unlearning evaluation has traditionally followed the retrieval paradigm, where adversaries attempt to extract residual knowledge of an unlearning target by issuing queries to a language model. However, the absence of retrievable knowledge does not necessarily prevent an adversary from inferring which targets have been intentionally unlearned in the post-training optimization. Such inferences can still pose significant privacy risks, as they may reveal the sensitive data in the model’s training set and the internal policies of model creators. To quantify such privacy risks, we propose a new evaluation framework **Forensic Unlearning Membership Attacks (FUMA)**, drawing on principles from membership inference attacks. FUMA assesses whether unlearning leaves behind detectable artifacts that can be exploited to infer membership in the forget set. Specifically, we evaluate four major optimization-based unlearning methods on 258 models across diverse unlearned settings and show that examples in the forget set can be identified up to 99% accuracy. This highlights privacy risks not covered in existing retrieval-based benchmarks. We conclude by discussing recommendations to mitigate these vulnerabilities.

2024

pdf bib abs
Enhancing Large Language Models through Transforming Reasoning Problems into Classification Tasks
Tarun Raheja | Raunak Sinha | Advit Deepak | Will Healy | Jayanth Srinivasa | Myungjin Lee | Ramana Kompella
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we introduce a novel approach for enhancing the reasoning capabilities of large language models (LLMs) for constraint satisfaction problems (CSPs), by converting reasoning problems into classification tasks. Our method leverages the LLM’s ability to decide when to call a function from a set of logical-linguistic primitives, each of which can interact with a local “scratchpad” memory and logical inference engine. Invocation of these primitives in the correct order writes the constraints to the scratchpad memory and enables the logical engine to verifiably solve the problem. We additionally propose a formal framework for exploring the “linguistic” hardness of CSP reasoning-problems for LLMs. Our experimental results demonstrate that under our proposed method, tasks with significant computational hardness can be converted to a form that is easier for LLMs to solve and yields a 40% improvement over baselines. This opens up new avenues for future research into hybrid cognitive models that integrate symbolic and neural approaches.

Co-authors

Venues

Fix author