Beyond Hallucination: Reframing LLM Quality Assessment as Task-Output Alignment

Andrew Hoblitzell


Abstract
Current hallucination detection systems operate under a flawed assumption: that model outputs deviating from factual grounding are uniformly problematic regardless of task context, modality, or cultural setting. Through analysis of computational humor as a motivating case study, we demonstrate that identical model behaviors require radically different evaluations depending on context. We propose reframing hallucination detection as task-output alignment assessment, introducing a three-dimensional framework spanning factual grounding requirements, novelty requirements, and risk tolerance.
Anthology ID:
2026.bigpicture-main.3
Volume:
Proceedings of The Big Picture v2: Crafting a Research Narrative
Month:
July
Year:
2026
Address:
San Diego, CA, USA
Editors:
Yanai Elazar, Allyson Ettinger, Nora Kassner, Sebastian Ruder
Venues:
BigPicture | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–30
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.3/
DOI:
Bibkey:
Cite (ACL):
Andrew Hoblitzell. 2026. Beyond Hallucination: Reframing LLM Quality Assessment as Task-Output Alignment. In Proceedings of The Big Picture v2: Crafting a Research Narrative, pages 22–30, San Diego, CA, USA. Association for Computational Linguistics.
Cite (Informal):
Beyond Hallucination: Reframing LLM Quality Assessment as Task-Output Alignment (Hoblitzell, BigPicture 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bigpicture-main.3.pdf