Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning

Joakim Olsen, Arild Brandrud Næss, Pierre Lison


Abstract
This paper explores how to automatically measure the quality of human-generated summaries, based on a Norwegian corpus of real estate condition reports and their corresponding summaries. The proposed approach proceeds in two steps. First, the real estate reports and their associated summaries are automatically labelled using a set of heuristic rules gathered from human experts and aggregated using weak supervision. The aggregated labels are then employed to learn a neural model that takes a document and its summary as inputs and outputs a score reflecting the predicted quality of the summary. The neural model maps the document and its summary to a shared “summary content space” and computes the cosine similarity between the two document embeddings to predict the final summary quality score. The best performance is achieved by a CNN-based model with an accuracy (measured against the aggregated labels obtained via weak supervision) of 89.5%, compared to 72.6% for the best unsupervised model. Manual inspection of examples indicate that the weak supervision labels do capture important indicators of summary quality, but the correlation of those labels with human judgements remains to be validated. Our models of summary quality predict that approximately 30% of the real estate reports in the corpus have a summary of poor quality.
Anthology ID:
2021.nodalida-main.12
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
112–123
Language:
URL:
https://aclanthology.org/2021.nodalida-main.12
DOI:
Bibkey:
Cite (ACL):
Joakim Olsen, Arild Brandrud Næss, and Pierre Lison. 2021. Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 112–123, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning (Olsen et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2021.nodalida-main.12.pdf