Abstract
We present our Charles-UPF submission for the Shared Task on Evaluating Accuracy in Generated Texts at INLG 2021. Our system can detect the errors automatically using a combination of a rule-based natural language generation (NLG) system and pretrained language models (LMs). We first utilize a rule-based NLG system to generate sentences with facts that can be derived from the input. For each sentence we evaluate, we select a subset of facts which are relevant by measuring semantic similarity to the sentence in question. Finally, we finetune a pretrained language model on annotated data along with the relevant facts for fine-grained error detection. On the test set, we achieve 69% recall and 75% precision with a model trained on a mixture of human-annotated and synthetic data.- Anthology ID:
- 2021.inlg-1.25
- Volume:
- Proceedings of the 14th International Conference on Natural Language Generation
- Month:
- August
- Year:
- 2021
- Address:
- Aberdeen, Scotland, UK
- Editors:
- Anya Belz, Angela Fan, Ehud Reiter, Yaji Sripada
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 259–265
- Language:
- URL:
- https://aclanthology.org/2021.inlg-1.25
- DOI:
- 10.18653/v1/2021.inlg-1.25
- Cite (ACL):
- Zdeněk Kasner, Simon Mille, and Ondřej Dušek. 2021. Text-in-Context: Token-Level Error Detection for Table-to-Text Generation. In Proceedings of the 14th International Conference on Natural Language Generation, pages 259–265, Aberdeen, Scotland, UK. Association for Computational Linguistics.
- Cite (Informal):
- Text-in-Context: Token-Level Error Detection for Table-to-Text Generation (Kasner et al., INLG 2021)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2021.inlg-1.25.pdf
- Code
- kasnerz/accuracysharedtask_cuni-upf
- Data
- RotoWire