Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

Daniel Deutsch, Tania Bedrax-Weiss, Dan Roth


Abstract
A desirable property of a reference-based evaluation metric that measures the content quality of a summary is that it should estimate how much information that summary has in common with a reference. Traditional text overlap based metrics such as ROUGE fail to achieve this because they are limited to matching tokens, either lexically or via embeddings. In this work, we propose a metric to evaluate the content quality of a summary using question-answering (QA). QA-based methods directly measure a summary’s information overlap with a reference, making them fundamentally different than text overlap metrics. We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval. QAEval outperforms current state-of-the-art metrics on most evaluations using benchmark datasets, while being competitive on others due to limitations of state-of-the-art models. Through a careful analysis of each component of QAEval, we identify its performance bottlenecks and estimate that its potential upper-bound performance surpasses all other automatic metrics, approaching that of the gold-standard Pyramid Method.1
Anthology ID:
2021.tacl-1.47
Volume:
Transactions of the Association for Computational Linguistics, Volume 9
Month:
Year:
2021
Address:
Cambridge, MA
Editors:
Brian Roark, Ani Nenkova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
774–789
Language:
URL:
https://aclanthology.org/2021.tacl-1.47
DOI:
10.1162/tacl_a_00397
Bibkey:
Cite (ACL):
Daniel Deutsch, Tania Bedrax-Weiss, and Dan Roth. 2021. Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary. Transactions of the Association for Computational Linguistics, 9:774–789.
Cite (Informal):
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary (Deutsch et al., TACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.tacl-1.47.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-4/2021.tacl-1.47.mp4