Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics
Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson, Anya Belz
Abstract
Ensuring semantic consistency between semantic-triple inputs and generated text is crucial in data‐to‐text generation, but continues to pose challenges both during generation and in evaluation. In order to assess how accurately semantic consistency can currently be assessed, we meta-evaluate 29 different evaluation methods in terms of their ability to predict human semantic-consistency ratings. The evaluation methods include embeddings‐based, overlap‐based, and edit‐distance metrics, as well as learned regressors and a prompted ‘LLM‐as‐judge’ protocol. We meta-evaluate on two datasets: the WebNLG 2017 human evaluation dataset, and a newly created WebNLG-style dataset that none of the methods can have seen during training. We find that none of the traditional textual similarity metrics or the pre-Transformer model-based metrics are suitable for the task of semantic consistency assessment. LLM-based methods perform well on the whole, but best correlations with human judgments still lag behind those seen in other text generation tasks.- Anthology ID:
- 2025.inlg-main.6
- Volume:
- Proceedings of the 18th International Natural Language Generation Conference
- Month:
- October
- Year:
- 2025
- Address:
- Hanoi, Vietnam
- Editors:
- Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 98–107
- Language:
- URL:
- https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.6/
- DOI:
- Cite (ACL):
- Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson, and Anya Belz. 2025. Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics. In Proceedings of the 18th International Natural Language Generation Conference, pages 98–107, Hanoi, Vietnam. Association for Computational Linguistics.
- Cite (Informal):
- Assessing Semantic Consistency in Data‐to‐Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics (Huidrom et al., INLG 2025)
- PDF:
- https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.6.pdf