This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
RishavSahay
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Ambiguous user queries pose a significant challenge in task-oriented dialogue systems relying on information retrieval. While Large Language Models (LLMs) have shown promise in generating clarification questions to tackle query ambiguity, they rely solely on the top-k retrieved documents for clarification which fails when ambiguity is too high to retrieve relevant documents in the first place. Traditional approaches lack principled mechanisms to determine when to use broad domain knowledge vs specific retrieved document context for clarification. We propose AsK, a novel hybrid approach that dynamically chooses between document-based or aspect-based clarification based on query ambiguity. Our approach requires no labeled clarification data and introduces: (1) Weakly-supervised Longformer-based ambiguity analysis, (2) Automated domain-specific aspect generation using clustering and LLMs and (3) LLM-powered clarification generation. AsK demonstrates significant improvements over baselines in both single-turn and multi-turn settings (recall@5 gain of ~20%) when evaluated on product troubleshooting and product search datasets.
Task-oriented Dialog systems (ToD) are essential in automating user interactions, but their complex design and dynamic nature make evaluation particularly challenging. Current evaluation methodologies heavily depend on human annotators, which can be inefficient, subjective, and expensive to scale. To advance the field, there is a pressing need for a reliable, scalable, and systematic evaluation framework that can provide comprehensive insights into ToD system performance. In this paper, we propose, AutoEval-TOD, an automated end-to-end evaluation framework using large language models (LLMs). Our framework first interacts with the ToD system and then assesses its performance across key dimensions by analyzing both the ToD’s responses and internal states. We validate our approach by applying it to multiple ToD systems, highlighting its adaptability and potential for widespread use in both research and industrial settings.
Effective customer support requires domain-specific solutions tailored to users’ issues. However, LLMs like ChatGPT, while excelling in open-domain tasks, often face challenges such as hallucinations, lack of domain compliance, and imprecise solutions when applied to specialized contexts. RAG-based systems, designed to combine domain context from unstructured knowledge bases (KBs) with LLMs, often struggle with noisy retrievals, further limiting their effectiveness in addressing user issues. Consequently, a sanitized KB is essential to ensure solution accuracy, precision, and domain compliance. To address this, we propose AutoKB, an automated pipeline for building a domain-specific KB with a hierarchical tree structure that maps user issues to precise and domain-compliant solutions. This structure facilitates granular issue resolution by improving real-time retrieval of user-specific solutions. Experiments in troubleshooting and medical domains demonstrate that our approach significantly enhances solution correctness, preciseness, and domain compliance, outperforming LLMs and unstructured KB baselines. Moreover, AutoKB is 75 times more cost-effective than manual methods.