Abstract
While automatically computing numerical scores remains the dominant paradigm in NLP system evaluation, error analysis is receiving increasing attention, with numerous error annotation schemes being proposed for automatically generated text. However, there is little agreement about what error annotation schemes should look like, how many different types of errors should be distinguished and at what level of granularity. In this paper, our aim is to map out recent work on annotating errors in automatically generated text, with a particular focus on error taxonomies. We describe our systematic paper selection process, and survey the error annotation schemes reported in the papers, drawing out similarities and differences between them. Finally, we characterise the issues that would make it difficult to move from the current situation to a standardised error taxonomy for annotating errors in automatically generated text.- Anthology ID:
- 2022.gem-1.33
- Volume:
- Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Venue:
- GEM
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 383–398
- Language:
- URL:
- https://aclanthology.org/2022.gem-1.33
- DOI:
- Cite (ACL):
- Rudali Huidrom and Anya Belz. 2022. A Survey of Recent Error Annotation Schemes for Automatically Generated Text. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 383–398, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- A Survey of Recent Error Annotation Schemes for Automatically Generated Text (Huidrom & Belz, GEM 2022)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2022.gem-1.33.pdf