RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation

Ian Poey, Jiajun Li1, Qishuai Zhong


Abstract
Real-time identification of out-of-context outputs from large language models (LLMs) is crucial for enterprises to safely adopt retrieval augmented generation (RAG) systems. In this work, we develop lightweight models capable of detecting when LLM-generated text deviates from retrieved source documents semantically. We compare their performance against open-source alternatives on data from credit policy and sustainability reports used in the banking industry. The fine-tuned DeBERTa model stands out for its superior performance, speed, and simplicity, as it requires no additional preprocessing or feature engineering. While recent research often prioritises state-of-the-art accuracy through fine-tuned generative LLMs and complex training pipelines, we demonstrate how detection models are deployed efficiently with high speed and minimal resource usage.
Anthology ID:
2025.emnlp-industry.73
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1057–1071
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.73/
DOI:
Bibkey:
Cite (ACL):
Ian Poey, Jiajun Li1, and Qishuai Zhong. 2025. RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1057–1071, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation (Poey et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.73.pdf