Using Locally Learnt Word Representations for better Textual Anomaly Detection

Alicia Breidenstein, Matthieu Labeau


Abstract
The literature on general purpose textual Anomaly Detection is quite sparse, as most textual anomaly detection methods are implemented as out of domain detection in the context of pre-established classification tasks. Notably, in a field where pre-trained representations and models are of common use, the impact of the pre-training data on a task that lacks supervision has not been studied. In this paper, we use the simple setting of k-classes out anomaly detection and search for the best pairing of representation and classifier. We show that well-chosen embeddings allow a simple anomaly detection baseline such as OC-SVM to achieve similar results and even outperform deep state-of-the-art models.
Anthology ID:
2024.insights-1.11
Volume:
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
82–91
Language:
URL:
https://aclanthology.org/2024.insights-1.11
DOI:
Bibkey:
Cite (ACL):
Alicia Breidenstein and Matthieu Labeau. 2024. Using Locally Learnt Word Representations for better Textual Anomaly Detection. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 82–91, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Using Locally Learnt Word Representations for better Textual Anomaly Detection (Breidenstein & Labeau, insights-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.insights-1.11.pdf