Mayank Sati

2025

pdf bib abs
STREAQ: Selective Tiered Routing for Effective and Affordable Contact Center Quality Assurance
Prajwal Sood | Rajdeep Agrawal | Mayank Sati | Digvijay Anil Ingle | Cijo George
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Contact centers process millions of customer conversations daily, requiring Quality Assurance (QA) teams to evaluate agent performance against compliance and service standards, often by answering agent evaluation questionnaires. Traditional manual QA cannot scale to growing volumes, while fully automated evaluation using large language models presents a cost-performance trade-off. High-performing models excel at detecting rare but business-critical Answers of Interest (AoI) but incur prohibitive costs, while smaller fine-tuned models are economical but suffer from poor AoI precision, generating high false positive rates that erode agent trust and waste QA resources. We introduce STREAQ, a two-tier selective routing framework to intelligently route queries between cost-efficient and high-capability models. Based on benchmarking on a proprietary dataset across six large LMs, STREAQ achieves substantial cost reduction while preserving critical performance. Using Nova-Pro, STREAQ reduces daily costs by 48% from 34,162 to17,842 while retaining 88.9% of full-model AoI precision. Our ablation studies reveal that flawed reasoning from smaller models can degrade performance, emphasizing the importance of carefully designing routing systems, making enterprise-scale automated QA both practical and economically viable.

2024

pdf bib abs
Probing the Depths of Language Models’ Contact-Center Knowledge for Quality Assurance
Digvijay Anil Ingle | Aashraya Sachdeva | Surya Prakash Sahu | Mayank Sati | Cijo George | Jithendra Vepa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Recent advancements in large Language Models (LMs) have significantly enhanced their capabilities across various domains, including natural language understanding and generation. In this paper, we investigate the application of LMs to the specialized task of contact-center Quality Assurance (QA), which involves evaluating conversations between human agents and customers. This task requires both sophisticated linguistic understanding and deep domain knowledge. We conduct a comprehensive assessment of eight LMs, revealing that larger models, such as Claude-3.5-Sonnet, exhibit superior performance in comprehending contact-center conversations. We introduce methodologies to transfer this domain-specific knowledge to smaller models by leveraging evaluation plans generated by more knowledgeable models, with optional human-in-the-loop refinement to enhance the capabilities of smaller models. Notably, our experimental results demonstrate an improvement of up to 18.95% in Macro F1 on an in-house QA dataset. Our findings emphasize the importance of evaluation plans in guiding reasoning and highlight the potential of AI-assisted tools to advance objective, consistent, and scalable agent evaluation processes in contact centers.

Co-authors

Prajwal Sood 1

Jithendra Vepa 1

Venues

emnlp2

Fix author