Saar Kuzi


2025

pdf bib
Generative Product Recommendations for Implicit Superlative Queries
Kaustubh Dhole | Nikhita Vedula | Saar Kuzi | Giuseppe Castellucci | Eugene Agichtein | Shervin Malmasi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

In recommender systems, users often seek the best products through indirect, vague, or under-specified queries such as “best shoes for trail running.” These queries, referred to as implicit superlative queries, pose a challenge for standard retrieval and ranking systems due to their lack of explicit attribute mentions and the need for identifying and reasoning over complex attributes. We investigate how Large Language Models (LLMs) can generate implicit attributes for ranking and reason over them to improve product recommendations for such queries. As a first step, we propose a novel four-point schema, called SUPERB, for annotating the best product candidates for superlative queries, paired with LLM-based product annotations. We then empirically evaluate several existing retrieval and ranking approaches on our newly created dataset, providing insights and discussing how to integrate these findings into real-world e-commerce production systems.

pdf bib
Ambiguity Detection and Uncertainty Calibration for Question Answering with Large Language Models
Zhengyan Shi | Giuseppe Castellucci | Simone Filice | Saar Kuzi | Elad Kravi | Eugene Agichtein | Oleg Rokhlenko | Shervin Malmasi
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)

Large Language Models (LLMs) have demonstrated excellent capabilities in Question Answering (QA) tasks, yet their ability to identify and address ambiguous questions remains underdeveloped. Ambiguities in user queries often lead to inaccurate or misleading answers, undermining user trust in these systems. Despite prior attempts using prompt-based methods, performance has largely been equivalent to random guessing, leaving a significant gap in effective ambiguity detection. To address this, we propose a novel framework for detecting ambiguous questions within LLM-based QA systems. We first prompt an LLM to generate multiple answers to a question, and then analyze them to infer the ambiguity. We propose to use a lightweight Random Forest model, trained on a bootstrapped and shuffled 6-shot examples dataset. Experimental results on ASQA, PACIFIC, and ABG-COQA datasets demonstrate the effectiveness of our approach, with accuracy up to 70.8%. Furthermore, our framework enhances the confidence calibration of LLM outputs, leading to more trustworthy QA systems able to handle complex questions.

2023

pdf bib
External Knowledge Acquisition for End-to-End Document-Oriented Dialog Systems
Tuan M. Lai | Giuseppe Castellucci | Saar Kuzi | Heng Ji | Oleg Rokhlenko
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

End-to-end neural models for conversational AI often assume that a response can be generated by considering only the knowledge acquired by the model during training. Document-oriented conversational models make a similar assumption by conditioning the input on the document and assuming that any other knowledge is captured in the model’s weights. However, a conversation may refer to external knowledge sources. In this work, we present EKo-Doc, an architecture for document-oriented conversations with access to external knowledge: we assume that a conversation is centered around a topic document and that external knowledge is needed to produce responses. EKo-Doc includes a dense passage retriever, a re-ranker, and a response generation model. We train the model end-to-end by using silver labels for the retrieval and re-ranking components that we automatically acquire from the attention signals of the response generation model. We demonstrate with automatic and human evaluations that incorporating external knowledge improves response generation in document-oriented conversations. Our architecture achieves new state-of-the-art results on the Wizard of Wikipedia dataset, outperforming a competitive baseline by 10.3% in Recall@1 and 7.4% in ROUGE-L.

2022

pdf bib
Wizard of Tasks: A Novel Conversational Dataset for Solving Real-World Tasks in Conversational Settings
Jason Ingyu Choi | Saar Kuzi | Nikhita Vedula | Jie Zhao | Giuseppe Castellucci | Marcus Collins | Shervin Malmasi | Oleg Rokhlenko | Eugene Agichtein
Proceedings of the 29th International Conference on Computational Linguistics

Conversational Task Assistants (CTAs) are conversational agents whose goal is to help humans perform real-world tasks. CTAs can help in exploring available tasks, answering task-specific questions and guiding users through step-by-step instructions. In this work, we present Wizard of Tasks, the first corpus of such conversations in two domains: Cooking and Home Improvement. We crowd-sourced a total of 549 conversations (18,077 utterances) with an asynchronous Wizard-of-Oz setup, relying on recipes from WholeFoods Market for the cooking domain, and WikiHow articles for the home improvement domain. We present a detailed data analysis and show that the collected data can be a valuable and challenging resource for CTAs in two tasks: Intent Classification (IC) and Abstractive Question Answering (AQA). While on IC we acquired a high performing model (>85% F1), on AQA the performance is far from being satisfactory (~27% BertScore-F1), suggesting that more work is needed to solve the task of low-resource AQA.