Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification

Sopan Khosla; Rashmi Gangadharaiah

doi:10.18653/v1/2022.insights-1.3

Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification

Abstract

Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a hold-out set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques.

Anthology ID:: 2022.insights-1.3
Volume:: Proceedings of the Third Workshop on Insights from Negative Results in NLP
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
Venue:: insights
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–23
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.insights-1.3/
DOI:: 10.18653/v1/2022.insights-1.3
Bibkey:
Cite (ACL):: Sopan Khosla and Rashmi Gangadharaiah. 2022. Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 18–23, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification (Khosla & Gangadharaiah, insights 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.insights-1.3.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.insights-1.3.mp4

PDF Cite Search Video Fix data