Damir Juric
2021
Towards Objectively Evaluating the Quality of Generated Medical Summaries
Francesco Moramarco
|
Damir Juric
|
Aleksandar Savkov
|
Ehud Reiter
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
Search