Solving the Task but Not the Problem: A Customer Support Case Study on Why Extrinsic Evaluation Matters

Daniel Braun


Abstract
Natural Language Processing has long been used in customer support to automate and augment human agents. Despite its long-standing use and clear practical relevance, most scientific evaluations rely on intrinsic evaluations and metrics such as accuracy or F1-score. In this paper, we argue that such evaluations often fail to reflect real-world system impact. We present a case study of an NLP system for email-based customer support evaluated both intrinsically and extrinsically via a before-and-after study in deployment. While the system achieves strong intrinsic performance, we observe no measurable improvement in key operational metrics such as average handle time per email. These results highlight a mismatch between benchmark performance and real-world effectiveness, supporting calls for more systematic extrinsic evaluation of NLP systems.
Anthology ID:
2026.retroeval-main.7
Volume:
Proceedings of the 1st Symposium on Natural Language Generation Evaluations
Month:
June
Year:
2026
Address:
Aberdeen, United Kingdom
Editors:
Saad Mahamood, David M. Howcroft, Kees van Deemter, Simone Balloccu, Adarsa Sivaprasad, Barkavi Sundararajan, Alberto Bugarín Diz, Jose María Alonso-Moral
Venue:
RetroEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–62
Language:
URL:
https://preview.aclanthology.org/ingest-retroeval/2026.retroeval-main.7/
DOI:
Bibkey:
Cite (ACL):
Daniel Braun. 2026. Solving the Task but Not the Problem: A Customer Support Case Study on Why Extrinsic Evaluation Matters. In Proceedings of the 1st Symposium on Natural Language Generation Evaluations, pages 53–62, Aberdeen, United Kingdom. Association for Computational Linguistics.
Cite (Informal):
Solving the Task but Not the Problem: A Customer Support Case Study on Why Extrinsic Evaluation Matters (Braun, RetroEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-retroeval/2026.retroeval-main.7.pdf