Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling

Leslie Barrett; Vikram Sunil Bajaj; Robert John Kingan

doi:10.18653/v1/2025.findings-emnlp.341

Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling

Leslie Barrett, Vikram Sunil Bajaj, Robert John Kingan

Abstract

Anomaly detection (AD), also known as Outlier Detection, is a longstanding problem in machine learning, which has recently been applied to text data. In these datasets, a textual anomaly is a part of the text that does not fit the overall topic of the text. Some recent approaches to textual AD have used transformer models, achieving positive results but with trade-offs in pre-training time and inflexibility with respect to new domains. Others have used linear models which are fast and more flexible but not always competitive on certain datasets. We introduce a new approach based on Large Pre-trained Language Models in three modalities. Our findings indicate that LLMs beat baselines when AD is presented as an imbalanced classification problem regardless of the concentration of anomalous samples. However, their performance is markedly worse on unsupervised AD, suggesting that the concept of “anomaly” may somehow elude the LLM reasoning process.

Anthology ID:: 2025.findings-emnlp.341
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6428–6435
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.341/
DOI:: 10.18653/v1/2025.findings-emnlp.341
Bibkey:
Cite (ACL):: Leslie Barrett, Vikram Sunil Bajaj, and Robert John Kingan. 2025. Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 6428–6435, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Can LLMs Find a Needle in a Haystack? A Look at Anomaly Detection Language Modeling (Barrett et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.341.pdf
Checklist:: 2025.findings-emnlp.341.checklist.pdf

PDF Cite Search Checklist Fix data