Pedro Vidigal


2026

We present a five-year case study of sentiment model selection for customer satisfaction (CSAT) prediction in B2B technical support. Our evaluation uses the complete population of CSAT-rated tickets from an enterprise software company: over 500 tickets comprising 2,500 customer comments from 100+ organizations over five years. We evaluate 17 approaches across 5 paradigms (lexicon, off-the-shelf transformers, NLI zero-shot, multi-task LLM agent, and 12 dedicated LLM agents from 6 vendor families), plus 11 fine-tuning experiments (all achieving MCC0). Key findings: (1) a dedicated single-task LLM agent reduces neutral bias from 69% to 22%, improving MCC from -0.018 to 0.347 (p<0.001); (2) our results are consistent with the "Alignment Tax" (Lin et al., 2024; Wu et al., 2025) in sentiment classification: Claude Opus 4.6 exhibits 41% neutral predictions and lower recall than its budget model Haiku 4.5 (p=0.003); (3) 38% of dissatisfied customers are undetectable by all 12 LLMs due to administrative requests lacking emotional language; (4) Gemini 3 Flash achieves the best MCC (0.347) at 0.60/1K, over 100× cheaper than Claude Opus. We describe the three-phase production deployment and provide practitioner recommendations.
Search
Co-authors
    Venues
    Fix author