Spandan Anaokar

2025

pdf bib abs
HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain
Spandan Anaokar | Shrey Ganatra | Swapnil Bhattacharyya | Harshvivek Kashid | Shruthi N Nair | Reshma Sekhar | Siddharth Manohar | Rahul Hemrajani | Pushpak Bhattacharyya
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications. This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry. We develop **HalluDetect**, an LLM-based hallucination detection system that achieves an F1 score of **68.92%** outperforming baseline detectors by **22.47%**. Benchmarking five hallucination mitigation architectures, we find that out of them, AgentBot minimizes hallucinations to **0.4159** per turn while maintaining the highest token accuracy (**96.13%**), making it the most effective mitigation strategy. Our findings provide a scalable framework for hallucination mitigation, demonstrating that optimized inference strategies can significantly improve factual accuracy.

Access to consumer grievance redressal in India is often hindered by procedural complexity, legal jargon, and jurisdictional challenges. To address this, we present Grahak-Nyay (Justice-to-Consumers), a chatbot that streamlines the process using open-source Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). Grahak-Nyay simplifies legal complexities through a concise and up-to-date knowledge base. We introduce three novel datasets: GeneralQA (general consumer law), SectoralQA (sector-specific knowledge) and SyntheticQA (for RAG evaluation), along with NyayChat, a dataset of 303 annotated chatbot conversations. We also introduce Judgments data sourced from Indian Consumer Courts to aid the chatbot in decision making and to enhance user trust. We also propose HAB metrics (Helpfulness, Accuracy, Brevity) to evaluate chatbot performance. Legal domain experts validated Grahak-Nyay’s effectiveness. Code and datasets are available at https://github.com/ShreyGanatra/GrahakNyay.git.

AI-based judicial assistance and case prediction have been extensively studied in criminal and civil domains, but remain largely unexplored in consumer law, especially in India. In this paper, we present Nyay-Darpan, a novel two-in-one framework that (i) summarizes consumer case files and (ii) retrieves similar case judgements to aid decision-making in consumer dispute resolution. Our methodology not only addresses the gap in consumer law AI tools, but also introduces an innovative approach to evaluate the quality of the summary. The term ‘Nyay-Darpan’ translates into ‘Mirror of Justice’, symbolizing the ability of our tool to reflect the core of consumer disputes through precise summarization and intelligent case retrieval. Our system achieves over 75 percent precision in similar case prediction and approximately 70 percent accuracy across material summary evaluation metrics, demonstrating its practical effectiveness. We will publicly release the Nyay-Darpan framework and dataset to promote reproducibility and facilitate further research in this underexplored yet impactful domain.

Co-authors

Siddharth Manohar 3

Shruthi N Nair 3

Reshma Sekhar 3

Venues

Fix author