Abstract
Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a hold-out set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques.- Anthology ID:
- 2022.insights-1.3
- Volume:
- Proceedings of the Third Workshop on Insights from Negative Results in NLP
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Shabnam Tafreshi, João Sedoc, Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Arjun Akula
- Venue:
- insights
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18–23
- Language:
- URL:
- https://aclanthology.org/2022.insights-1.3
- DOI:
- 10.18653/v1/2022.insights-1.3
- Cite (ACL):
- Sopan Khosla and Rashmi Gangadharaiah. 2022. Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification. In Proceedings of the Third Workshop on Insights from Negative Results in NLP, pages 18–23, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification (Khosla & Gangadharaiah, insights 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.insights-1.3.pdf