Purav Aggarwal


2025

pdf bib
ASK: Aspects and Retrieval based Hybrid Clarification in Task Oriented Dialogue Systems
Rishav Sahay | Lavanya Sita Tekumalla | Purav Aggarwal | Arihant Jain | Anoop Saladi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Ambiguous user queries pose a significant challenge in task-oriented dialogue systems relying on information retrieval. While Large Language Models (LLMs) have shown promise in generating clarification questions to tackle query ambiguity, they rely solely on the top-k retrieved documents for clarification which fails when ambiguity is too high to retrieve relevant documents in the first place. Traditional approaches lack principled mechanisms to determine when to use broad domain knowledge vs specific retrieved document context for clarification. We propose AsK, a novel hybrid approach that dynamically chooses between document-based or aspect-based clarification based on query ambiguity. Our approach requires no labeled clarification data and introduces: (1) Weakly-supervised Longformer-based ambiguity analysis, (2) Automated domain-specific aspect generation using clustering and LLMs and (3) LLM-powered clarification generation. AsK demonstrates significant improvements over baselines in both single-turn and multi-turn settings (recall@5 gain of ~20%) when evaluated on product troubleshooting and product search datasets.

pdf bib
AutoChunker: Structured Text Chunking and its Evaluation
Arihant Jain | Purav Aggarwal | Anoop Saladi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Text chunking is fundamental to modern retrieval-augmented systems, yet existing methods often struggle with maintaining semantic coherence, both within and across chunks, while dealing with document structure and noise. We present AutoChunker, a bottom-up approach for text chunking that combines document structure awareness with noise elimination. AutoChunker leverages language models to identify and segregate logical units of information (a chunk) while preserving document hierarchy through a tree-based representation. To evaluate the chunking operator, we introduce a comprehensive evaluation framework based on five core tenets: noise reduction, completeness, context coherence, task relevance, and retrieval performance. Experimental results on Support and Wikipedia articles demonstrate that AutoChunker significantly outperforms existing methods, reducing noise while improving chunk completeness compared to state-of-the-art baselines. When integrated with an online product support system, our approach led to improvements in retrieval performance and customer return rates. Our work not only advances the state of text chunking but also provides a standardized framework for evaluating chunking strategies, addressing a critical gap in the field.

pdf bib
VADE: Visual Attention Guided Hallucination Detection and Elimination
Vishnu Prabhakaran | Purav Aggarwal | Vinay Kumar Verma | Gokul Swamy | Anoop Saladi
Findings of the Association for Computational Linguistics: ACL 2025

Vision Language Models (VLMs) have achieved significant advancements in complex visual understanding tasks. However, VLMs are prone to hallucinations—generating outputs that lack alignment with visual content. This paper addresses hallucination detection in VLMs by leveraging the visual grounding information encoded in transformer attention maps. We identify three primary challenges in this approach: the elective nature of visual grounding for certain tokens, the high-dimensional and noisy nature of attention maps, and the dynamic sequence length of attention on previous tokens. To address these, we propose VADE, a novel sequence modelling approach to effectively learn complex sequential patterns from high-dimensional and noisy attention maps for fine-grained hallucination detection and mitigation. VADE achieves an average PR-AUC of 80% in hallucination detection on M-HalDetect across four different model architectures and an 5% improvement in hallucination mitigation on MSCOCO.

pdf bib
AutoEval-ToD: Automated Evaluation of Task-oriented Dialog Systems
Arihant Jain | Purav Aggarwal | Rishav Sahay | Chaosheng Dong | Anoop Saladi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Task-oriented Dialog systems (ToD) are essential in automating user interactions, but their complex design and dynamic nature make evaluation particularly challenging. Current evaluation methodologies heavily depend on human annotators, which can be inefficient, subjective, and expensive to scale. To advance the field, there is a pressing need for a reliable, scalable, and systematic evaluation framework that can provide comprehensive insights into ToD system performance. In this paper, we propose, AutoEval-TOD, an automated end-to-end evaluation framework using large language models (LLMs). Our framework first interacts with the ToD system and then assesses its performance across key dimensions by analyzing both the ToD’s responses and internal states. We validate our approach by applying it to multiple ToD systems, highlighting its adaptability and potential for widespread use in both research and industrial settings.

pdf bib
VIT-Pro: Visual Instruction Tuning for Product Images
Vishnu Prabhakaran | Purav Aggarwal | Vishruit Kulshreshtha | Arunita Das | Sahini Venkata Sitaram Sruti | Anoop Saladi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

General vision-language models (VLMs) trained on web data struggle to understand and converse about real-world e-commerce product images. We propose a cost-efficient approach for collecting training data to train a generative VLM for e-commerce product images. The key idea is to leverage large-scale, loosely-coupled image-text pairs from e-commerce stores, use a pretrained LLM to generate multimodal instruction-following data, and fine-tune a general vision-language model using LoRA. Our instruction-finetuned model, VIT-Pro, can understand and respond to queries about product images, covering diverse concepts and tasks. VIT-Pro outperforms several general-purpose VLMs on multiple vision tasks in the e-commerce domain.

pdf bib
AutoKB: Automated Creation of Structured Knowledge Bases for Domain-Specific Support
Rishav Sahay | Arihant Jain | Purav Aggarwal | Anoop Saladi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Effective customer support requires domain-specific solutions tailored to users’ issues. However, LLMs like ChatGPT, while excelling in open-domain tasks, often face challenges such as hallucinations, lack of domain compliance, and imprecise solutions when applied to specialized contexts. RAG-based systems, designed to combine domain context from unstructured knowledge bases (KBs) with LLMs, often struggle with noisy retrievals, further limiting their effectiveness in addressing user issues. Consequently, a sanitized KB is essential to ensure solution accuracy, precision, and domain compliance. To address this, we propose AutoKB, an automated pipeline for building a domain-specific KB with a hierarchical tree structure that maps user issues to precise and domain-compliant solutions. This structure facilitates granular issue resolution by improving real-time retrieval of user-specific solutions. Experiments in troubleshooting and medical domains demonstrate that our approach significantly enhances solution correctness, preciseness, and domain compliance, outperforming LLMs and unstructured KB baselines. Moreover, AutoKB is 75 times more cost-effective than manual methods.