Evaluating the Reliability of LLMs in Faithfully Updating Text: An Empirical Study

Ayan Datta; Paheli Bhattacharya; Rishabh Gupta

Evaluating the Reliability of LLMs in Faithfully Updating Text: An Empirical Study

Ayan Datta, Paheli Bhattacharya, Rishabh Gupta

Abstract

We provide a comprehensive review of the FRUIT (Faithfully Reflecting Updated Information in Text) task, which formalizes the challenge of accurately updating textual information with large language models (LLMs). Our work begins with an in-depth analysis of the FRUIT dataset, revealing key structural insights. We also investigate the unsupervised capabilities of LLMs—such as zero-shot learning, chain-of-thought reasoning, self-reflection, and evidence ordering. Experimental results demonstrate that unsupervised approaches perform competitively with supervised methods in faithful text updating. Qualitative analysis shows that updates utilizing table-structured evidence outperform those based on unstructured text. We also discuss important limitations, including the need for new datasets and the risks of information leakage in this domain. These findings have significant implications for applications requiring precise document updates, such as software engineering, technical documentation, and legal document maintenance.

Anthology ID:: 2026.gem-main.28
Volume:: Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:: GEM | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 271–284
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.28/
DOI:
Bibkey:
Cite (ACL):: Ayan Datta, Paheli Bhattacharya, and Rishabh Gupta. 2026. Evaluating the Reliability of LLMs in Faithfully Updating Text: An Empirical Study. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 271–284, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Evaluating the Reliability of LLMs in Faithfully Updating Text: An Empirical Study (Datta et al., GEM 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.28.pdf

PDF Cite Search Fix data