Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization

Neslihan Iskender, Tim Polzehl, Sebastian Möller


Abstract
The intrinsic and extrinsic quality evaluation is an essential part of the summary evaluation methodology usually conducted in a traditional controlled laboratory environment. However, processing large text corpora using these methods reveals expensive from both the organizational and the financial perspective. For the first time, and as a fast, scalable, and cost-effective alternative, we propose micro-task crowdsourcing to evaluate both the intrinsic and extrinsic quality of query-based extractive text summaries. To investigate the appropriateness of crowdsourcing for this task, we conduct intensive comparative crowdsourcing and laboratory experiments, evaluating nine extrinsic and intrinsic quality measures on 5-point MOS scales. Correlating results of crowd and laboratory ratings reveals high applicability of crowdsourcing for the factors overall quality, grammaticality, non-redundancy, referential clarity, focus, structure & coherence, summary usefulness, and summary informativeness. Further, we investigate the effect of the number of repetitions of assessments on the robustness of mean opinion score of crowd ratings, measured against the increase of correlation coefficients between crowd and laboratory. Our results suggest that the optimal number of repetitions in crowdsourcing setups, in which any additional repetitions do no longer cause an adequate increase of overall correlation coefficients, lies between seven and nine for intrinsic and extrinsic quality factors.
Anthology ID:
2020.lrec-1.31
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
245–253
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.31
DOI:
Bibkey:
Cite (ACL):
Neslihan Iskender, Tim Polzehl, and Sebastian Möller. 2020. Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 245–253, Marseille, France. European Language Resources Association.
Cite (Informal):
Towards a Reliable and Robust Methodology for Crowd-Based Subjective Quality Assessment of Query-Based Extractive Text Summarization (Iskender et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.lrec-1.31.pdf