Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans

Katrin Ortmann


Abstract
The traditional evaluation of labeled spans with precision, recall, and F1-score has undesirable effects due to double penalties. Annotations with incorrect label or boundaries count as two errors instead of one, despite being closer to the target annotation than false positives or false negatives. In this paper, new error types are introduced, which more accurately reflect true annotation quality and ensure that every annotation counts only once. An algorithm for error identification in flat and multi-level annotations is presented and complemented with a proposal on how to calculate meaningful precision, recall, and F1-scores based on the more fine-grained error types. The exemplary application to three different annotation tasks (NER, chunking, parsing) shows that the suggested procedure not only prevents double penalties but also allows for a more detailed error analysis, thereby providing more insight into the actual weaknesses of a system.
Anthology ID:
2022.lrec-1.150
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1400–1407
Language:
URL:
https://aclanthology.org/2022.lrec-1.150
DOI:
Bibkey:
Cite (ACL):
Katrin Ortmann. 2022. Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1400–1407, Marseille, France. European Language Resources Association.
Cite (Informal):
Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans (Ortmann, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.lrec-1.150.pdf
Code
 rubcompling/faireval