Damir Juric


2021

pdf bib
Towards Objectively Evaluating the Quality of Generated Medical Summaries
Francesco Moramarco | Damir Juric | Aleksandar Savkov | Ehud Reiter
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.