Anup Pattnaik


2026

Large Language Models (LLMs) have demonstrated considerable efficacy in classification tasks, yet their performance depends on two critical prompt components: Task Instructions (HOW to classify) and Class Descriptions (WHAT defines each class). While prompt engineering research has extensively explored instruction optimization, class descriptions have received comparatively less attention, often being treated as fixed inputs or simple label names. This represents a critical gap for real-world classification tasks, particularly in contact center domains, where labels often suffer from ambiguous boundaries, overlapping definitions, and incomplete coverage of possible cases—substantially limiting accuracy regardless of instruction quality.We propose a multi-agent framework for iteratively refining class descriptions based on classification errors. By analyzing misclassified instances, language agents automatically generate improved descriptions that better capture class distinctions and resolve ambiguities. Empirical evaluation across contact center and public benchmark datasets demonstrates upto 20.71% accuracy improvements over static class descriptions, addressing an orthogonal dimension to existing instruction optimization techniques.
Capturing organization-specific domain knowledge remains a critical challenge for deploying cost-efficient language models in specialized tasks like contact center Quality Assurance (QA). While large LMs implicitly capture expert judgment, smaller LMs require explicit evaluation criteria that domain experts struggle to articulate. We introduce Backward Question-based Refinement (BQR), a diagnostic framework that generates backward questions, revealing what a model understood rather than what was asked, to systematically distill implicit reasoning from large LMs into explicit evaluation plans. Through experiments on 12 QA questions, BQR achieves performance improvements on 8 questions with absolute gains of up to 27.8% in Macro F1. Our analysis establishes empirical parallels to gradient-descent optimization and reveals a cross-family advantage where small LMs benefit more from large LMs of different families. These findings confirm BQR as an effective approach for bridging the gap between implicit expert knowledge and explicit evaluation criteria.

2025

Classifying contact center interactions into a large number of categories is critical for downstream analytics, but challenging due to high label cardinality, and cost constraints. While Large Language Models (LLMs) offer flexibility for such tasks, existing methods degrade with increasing label space, showing significant inconsistencies and sensitivity to label ordering. We propose a scalable, cost-effective two-step retrieval-augmented classification framework, enhanced with a multi-view representation of labels. Our method significantly improves accuracy and consistency over baseline LLM approaches. Experiments across 4 private and 5 open datasets yield performance improvements of upto 14.6% while reducing inference cost by 60-91% compared to baseline approaches.

2024

In this work, we present an approach that introduces different perspectives or views to improve the quality of hierarchical clustering of interaction drivers in a contact center. Specifically, we present a multi-stage approach that introduces LLM-guided multi-view cluster representation that significantly improves the quality of generated clusters. Our approach improves average Silhouette Score by upto 70% and Human Preference Scores by 36.7% for top-level clusters compared to standard agglomerative clustering for the given business use-case. We also present how the proposed approach can be adapted to cater to a standard non-hierarchical clustering use-cases where it achieves state-of-the-art performance on public datasets based on NMI and ACC scores, with minimal number of LLM queries compared to the current state-of-the-art approaches. Moreover, we apply our technique to generate two new labeled datasets for hierarchical clustering. We open-source these labeled datasets, validated and corrected by domain experts, for the benefit of the research community.