Susmit Jha
2025
Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs
Ayush Gupta
|
Ramneet Kaur
|
Anirban Roy
|
Adam D. Cobb
|
Rama Chellappa
|
Susmit Jha
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We propose a novel inference-time out-of-domain (OOD) detection algorithm for specialized large language models (LLMs). Despite achieving state-of-the-art performance on in-domain tasks through fine-tuning, specialized LLMs remain vulnerable to incorrect or unreliable outputs when presented with OOD inputs, posing risks in critical applications. Our method leverages the Inductive Conformal Anomaly Detection (ICAD) framework, using a new non-conformity measure based on the model’s dropout tolerance. Motivated by recent findings on polysemanticity and redundancy in LLMs, we hypothesize that in-domain inputs exhibit higher dropout tolerance than OOD inputs. We aggregate dropout tolerance across multiple layers via a valid ensemble approach, improving detection while maintaining theoretical false alarm bounds from ICAD. Experiments with medical-specialized LLMs show that our approach detects OOD inputs better than baseline methods, with AUROC improvements of 2% to 37% when treating OOD datapoints as positives and in-domain test datapoints as negatives.
2024
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
Weimin Lyu
|
Xiao Lin
|
Songzhu Zheng
|
Lu Pang
|
Haibin Ling
|
Susmit Jha
|
Chao Chen
Findings of the Association for Computational Linguistics: NAACL 2024
Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.
Search
Fix author
Co-authors
- Rama Chellappa 1
- Chao Chen 1
- Adam D. Cobb 1
- Ayush Gupta 1
- Ramneet Kaur 1
- show all...