HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain

Spandan Anaokar, Shrey Ganatra, Swapnil Bhattacharyya, Harshvivek Kashid, Shruthi N Nair, Reshma Sekhar, Siddharth Manohar, Rahul Hemrajani, Pushpak Bhattacharyya


Abstract
Large Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications. This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry. We develop **HalluDetect**, an LLM-based hallucination detection system that achieves an F1 score of **68.92%** outperforming baseline detectors by **22.47%**. Benchmarking five hallucination mitigation architectures, we find that out of them, AgentBot minimizes hallucinations to **0.4159** per turn while maintaining the highest token accuracy (**96.13%**), making it the most effective mitigation strategy. Our findings provide a scalable framework for hallucination mitigation, demonstrating that optimized inference strategies can significantly improve factual accuracy.
Anthology ID:
2025.emnlp-industry.128
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1822–1847
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.128/
DOI:
Bibkey:
Cite (ACL):
Spandan Anaokar, Shrey Ganatra, Swapnil Bhattacharyya, Harshvivek Kashid, Shruthi N Nair, Reshma Sekhar, Siddharth Manohar, Rahul Hemrajani, and Pushpak Bhattacharyya. 2025. HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1822–1847, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain (Anaokar et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.128.pdf