Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression
Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, Antonio Ortega
Abstract
As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11× improvement in inference time and 87% reduction in storage requirements). It outperforms existing approaches by up to 4 AUROC points on four benchmarks. We also introduce an entropy-constrained version of our algorithm, leading to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings. Our source code is available on Github.- Anthology ID:
- 2024.findings-emnlp.758
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12943–12959
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.758/
- DOI:
- 10.18653/v1/2024.findings-emnlp.758
- Cite (ACL):
- Aryan Gulati, Xingjian Dong, Carlos Hurtado, Sarath Shekkizhar, Swabha Swayamdipta, and Antonio Ortega. 2024. Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 12943–12959, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Out-of-Distribution Detection through Soft Clustering with Non-Negative Kernel Regression (Gulati et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.758.pdf