Shruthi N Nair
2025
HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain
Spandan Anaokar
|
Shrey Ganatra
|
Swapnil Bhattacharyya
|
Harshvivek Kashid
|
Shruthi N Nair
|
Reshma Sekhar
|
Siddharth Manohar
|
Rahul Hemrajani
|
Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications. This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry. We develop **HalluDetect**, an LLM-based hallucination detection system that achieves an F1 score of **68.92%** outperforming baseline detectors by **22.47%**. Benchmarking five hallucination mitigation architectures, we find that out of them, AgentBot minimizes hallucinations to **0.4159** per turn while maintaining the highest token accuracy (**96.13%**), making it the most effective mitigation strategy. Our findings provide a scalable framework for hallucination mitigation, demonstrating that optimized inference strategies can significantly improve factual accuracy.