John Bohannon

2021

pdf bib
Is Human Scoring the Best Criteria for Summary Evaluation?
Oleg Vasilyev | John Bohannon
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
Oleg Vasilyev | John Bohannon
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is based on finding and counting all probable potential inconsistencies of the summary with respect to the source document. The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.

2020

pdf bib abs
Fill in the BLANC: Human-free quality estimation of document summaries
Oleg Vasilyev | Vedant Dharnidharka | John Bohannon
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the document’s text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.

Co-authors

Venues

Eval4NLP2
Findings1