When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition

Karen Jia-Hui Li; Simone Balloccu; Ondřej Dušek; Ehud Reiter

When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition

Karen Jia-Hui Li, Simone Balloccu, Ondrej Dusek, Ehud Reiter

Abstract

The increasing trust in large language models (LLMs), especially in the form of chatbots, is often undermined by the lack of their extrinsic evaluation. This holds particularly true in nutrition, where randomised controlled trials (RCTs) are the gold standard, and experts demand them for evidence-based deployment. LLMs have shown promising results in this field, but these are limited to intrinsic setups. We address this gap by running the first RCT involving LLMs for nutrition. We augment a rule-based chatbot with two LLM-based features: (1) message rephrasing for conversational variety and engagement, and (2) nutritional counselling through a fine-tuned model. In our seven-week RCT (n=81), we compare chatbot variants with and without LLM integration. We measure effects on dietary outcome, emotional well-being, and engagement. Despite our LLM-based features performing well in intrinsic evaluation, we find that they did not yield consistent benefits in real-world deployment. These results highlight critical gaps between intrinsic evaluations and real-world impact, emphasising the need for interdisciplinary, human-centred approaches.

Anthology ID:: 2025.inlg-main.44
Volume:: Proceedings of the 18th International Natural Language Generation Conference
Month:: October
Year:: 2025
Address:: Hanoi, Vietnam
Editors:: Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 753–779
Language:
URL:: https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.44/
DOI:
Bibkey:
Cite (ACL):: Karen Jia-Hui Li, Simone Balloccu, Ondrej Dusek, and Ehud Reiter. 2025. When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition. In Proceedings of the 18th International Natural Language Generation Conference, pages 753–779, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):: When LLMs Can’t Help: Real-World Evaluation of LLMs in Nutrition (Li et al., INLG 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.44.pdf

PDF Cite Search Fix data