Sasanka Vutla
2025
Scalable and Cost Effective High-Cardinality Classification with LLMs via Multi-View Label Representations and Retrieval Augmentation
Anup Pattnaik
|
Sasanka Vutla
|
Hamvir Dev
|
Jeevesh Nandan
|
Cijo George
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Classifying contact center interactions into a large number of categories is critical for downstream analytics, but challenging due to high label cardinality, and cost constraints. While Large Language Models (LLMs) offer flexibility for such tasks, existing methods degrade with increasing label space, showing significant inconsistencies and sensitivity to label ordering. We propose a scalable, cost-effective two-step retrieval-augmented classification framework, enhanced with a multi-view representation of labels. Our method significantly improves accuracy and consistency over baseline LLM approaches. Experiments across 4 private and 5 open datasets yield performance improvements of upto 14.6% while reducing inference cost by 60-91% compared to baseline approaches.
2024
Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation
Anup Pattnaik
|
Cijo George
|
Rishabh Kumar Tripathi
|
Sasanka Vutla
|
Jithendra Vepa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
In this work, we present an approach that introduces different perspectives or views to improve the quality of hierarchical clustering of interaction drivers in a contact center. Specifically, we present a multi-stage approach that introduces LLM-guided multi-view cluster representation that significantly improves the quality of generated clusters. Our approach improves average Silhouette Score by upto 70% and Human Preference Scores by 36.7% for top-level clusters compared to standard agglomerative clustering for the given business use-case. We also present how the proposed approach can be adapted to cater to a standard non-hierarchical clustering use-cases where it achieves state-of-the-art performance on public datasets based on NMI and ACC scores, with minimal number of LLM queries compared to the current state-of-the-art approaches. Moreover, we apply our technique to generate two new labeled datasets for hierarchical clustering. We open-source these labeled datasets, validated and corrected by domain experts, for the benefit of the research community.
Search
Fix author
Co-authors
- Cijo George 2
- Anup Pattnaik 2
- Hamvir Dev 1
- Jeevesh Nandan 1
- Rishabh Kumar Tripathi 1
- show all...