A Survey of Recent Error Annotation Schemes for Automatically Generated Text

Rudali Huidrom, Anya Belz


Abstract
While automatically computing numerical scores remains the dominant paradigm in NLP system evaluation, error analysis is receiving increasing attention, with numerous error annotation schemes being proposed for automatically generated text. However, there is little agreement about what error annotation schemes should look like, how many different types of errors should be distinguished and at what level of granularity. In this paper, our aim is to map out recent work on annotating errors in automatically generated text, with a particular focus on error taxonomies. We describe our systematic paper selection process, and survey the error annotation schemes reported in the papers, drawing out similarities and differences between them. Finally, we characterise the issues that would make it difficult to move from the current situation to a standardised error taxonomy for annotating errors in automatically generated text.
Anthology ID:
2022.gem-1.33
Volume:
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Venue:
GEM
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
383–398
Language:
URL:
https://aclanthology.org/2022.gem-1.33
DOI:
Bibkey:
Cite (ACL):
Rudali Huidrom and Anya Belz. 2022. A Survey of Recent Error Annotation Schemes for Automatically Generated Text. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 383–398, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
A Survey of Recent Error Annotation Schemes for Automatically Generated Text (Huidrom & Belz, GEM 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/2022.gem-1.33.pdf
Video:
 https://preview.aclanthology.org/starsem-semeval-split/2022.gem-1.33.mp4