Javaid Nabi


2026

Despite advances in large language models (LLMs), Task-Oriented Dialogue (TOD) systems often fall short in delivering personalized, context-rich responses, especially in low-resource, code-mixed, and multimodal settings like Hinglish (Hindi-English). To bridge this gap, we introduce HiVisTask, the first Hinglish multimodal, multidomain, persona-based TOD dataset that captures user-agent interactions across text and visual modalities. We also propose G3 TOD, a generalizable framework that enhances personalization using three structured knowledge graphs: entity context, user persona, and commonsense reasoning, all extracted from conversation history. Extensive experiments with LLMs (e.g., LLaMA3.2, Phi3, GPT4, Mistral7b, Qwen3, Gemma3) show that G3 TOD consistently outperforms both standard and ablated baselines. We observe substantial gains across evaluation metrics (both quantitative: BLEU ↑ and qualitative: Human Eval ↑) over existing models. The observed improvements strongly underscore the value of structured and selective contextualization in generating personalized and engaging multimodal responses.

2024

The success of virtual assistants relies on continuous performance monitoring to ensure their competitive edge in the market. This entails assessing their ability to understand user intents and execute tasks effectively. While user feedback is pivotal for measuring satisfaction levels, relying solely on explicit feedback proves impractical. Thus, extracting implicit user feedback from conversations of user and virtual assistant is a more efficient approach. Additionally, along with learning whether a task is performed correctly or not, it is extremely important to understand the reasons behind any incorrect execution. In this paper, we introduce a framework designed to identify dissatisfactory conversations, systematically analyze these conversations, and generate comprehensive reports detailing the reasons for user dissatisfaction. By implementing a feedback classifier, we identify conversations that indicate user dissatisfaction, which serves as a sign of implicit negative feedback. To analyze negative feedback conversations more deeply, we develop a lightweight pipeline called an issue categorizer ensemble with multiple models to understand the reasons behind such dissatisfactory conversations. We subsequently augment the identified discontented instances to generate additional data and train our models to prevent such failures in the future. Our implementation of this simple framework, called AsTrix (Assisted Triage and Fix), led to significant enhancements in the performance of our smartphone-based In-House virtual assistant, with successful task completion rates increasing from 83.1% to 92.6% between June 2022 and March 2024. Moreover, by automating the deeper analysis process targeting just five major issue types contributing to the dissatisfaction, we significantly address approximately 62% of the negative feedback conversation data.
Customer reviews are a valuable asset for businesses, especially in the competitive consumer electronics sector, where understanding user preferences and product performance is critical. However, extracting meaningful insights from these unstructured and often noisy reviews is a challenging task that typically requires significant manual effort. We present
In today’s era, data analytics is crucial because it allows organizations to make informed decisions based on the analysis of large amounts of data. The evolving landscape of data analytics presents a growing challenge in effectively translating natural language queries into actionable insights. To address this challenge, we introduce a novel system that seamlessly integrates natural language processing (NLP), graph-based feature representation, and code generation. Our method, called Analytics Graph Query Solver (AGQS), utilizes large language models (LLMs) to construct a dynamic graph representing keywords and engineered features. AGQS transforms textual input queries into structured descriptions and generates corresponding plans. These plans are executed stepwise to create a unified code, which is subsequently applied to our in-house virtual assistant dataset to fulfill the user’s query. Furthermore, a robust verification module ensures the reliability of the obtained results. Through experimentation, our system achieved an accuracy of 62.2%, outperforming models like GPT-4 (50.2%), Graph Reader (56.6%), Mistral3 7B (38.6%), and Llama 7B (37.6%). Overall, our approach highlights the importance of feature generation in textual query resolution and demonstrates notable improvements in accessibility and precision for data analytics. With this method, we aim to present a solution for converting natural language queries into actionable steps, ultimately generating code that provides data insights. This approach can be utilized across different datasets, empowering developers and researchers to gain valuable insights effortlessly.