Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

Ondřej Dušek, Zdeněk Kasner


Abstract
A major challenge in evaluating data-to-text (D2T) generation is measuring the semantic accuracy of the generated text, i.e. checking if the output text contains all and only facts supported by the input data. We propose a new metric for evaluating the semantic accuracy of D2T generation based on a neural model pretrained for natural language inference (NLI). We use the NLI model to check textual entailment between the input data and the output text in both directions, allowing us to reveal omissions or hallucinations. Input data are converted to text for NLI using trivial templates. Our experiments on two recent D2T datasets show that our metric can achieve high accuracy in identifying erroneous system outputs.
Anthology ID:
2020.inlg-1.19
Volume:
Proceedings of the 13th International Conference on Natural Language Generation
Month:
December
Year:
2020
Address:
Dublin, Ireland
Editors:
Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
131–137
Language:
URL:
https://aclanthology.org/2020.inlg-1.19
DOI:
10.18653/v1/2020.inlg-1.19
Bibkey:
Cite (ACL):
Ondřej Dušek and Zdeněk Kasner. 2020. Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference. In Proceedings of the 13th International Conference on Natural Language Generation, pages 131–137, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference (Dušek & Kasner, INLG 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.inlg-1.19.pdf
Supplementary attachment:
 2020.inlg-1.19.Supplementary_Attachment.pdf
Code
 ufal/nlgi_eval