Abstract
The literature on general purpose textual Anomaly Detection is quite sparse, as most textual anomaly detection methods are implemented as out of domain detection in the context of pre-established classification tasks. Notably, in a field where pre-trained representations and models are of common use, the impact of the pre-training data on a task that lacks supervision has not been studied. In this paper, we use the simple setting of k-classes out anomaly detection and search for the best pairing of representation and classifier. We show that well-chosen embeddings allow a simple anomaly detection baseline such as OC-SVM to achieve similar results and even outperform deep state-of-the-art models.- Anthology ID:
- 2024.insights-1.11
- Volume:
- Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
- Venues:
- insights | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 82–91
- Language:
- URL:
- https://aclanthology.org/2024.insights-1.11
- DOI:
- 10.18653/v1/2024.insights-1.11
- Cite (ACL):
- Alicia Breidenstein and Matthieu Labeau. 2024. Using Locally Learnt Word Representations for better Textual Anomaly Detection. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 82–91, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Using Locally Learnt Word Representations for better Textual Anomaly Detection (Breidenstein & Labeau, insights-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.insights-1.11.pdf