SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents
Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, Peter Haas
Abstract
Many applications require generation of summaries tailored to the user’s information needs, i.e., their intent. Methods that express intent via explicit user queries fall short when query interpretation is subjective. Several datasets exist for summarization with objective intents where, for each document and intent (e.g., “weather”), a single summary suffices for all users. No datasets exist, however, for subjective intents (e.g., “interesting places”) where different users will provide different summaries. We present SUBSUME, the first dataset for evaluation of SUBjective SUMmary Extraction systems. SUBSUME contains 2,200 (document, intent, summary) triplets over 48 Wikipedia pages, with ten intents of varying subjectivity, provided by 103 individuals over Mechanical Turk. We demonstrate statistically that the intents in SUBSUME vary systematically in subjectivity. To indicate SUBSUME’s usefulness, we explore a collection of baseline algorithms for subjective extractive summarization and show that (i) as expected, example-based approaches better capture subjective intents than query-based ones, and (ii) there is ample scope for improving upon the baseline algorithms, thereby motivating further research on this challenging problem.- Anthology ID:
- 2021.newsum-1.14
- Volume:
- Proceedings of the Third Workshop on New Frontiers in Summarization
- Month:
- November
- Year:
- 2021
- Address:
- Online and in Dominican Republic
- Editors:
- Giuseppe Carenini, Jackie Chi Kit Cheung, Yue Dong, Fei Liu, Lu Wang
- Venue:
- NewSum
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 131–141
- Language:
- URL:
- https://aclanthology.org/2021.newsum-1.14
- DOI:
- 10.18653/v1/2021.newsum-1.14
- Cite (ACL):
- Nishant Yadav, Matteo Brucato, Anna Fariha, Oscar Youngquist, Julian Killingback, Alexandra Meliou, and Peter Haas. 2021. SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents. In Proceedings of the Third Workshop on New Frontiers in Summarization, pages 131–141, Online and in Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- SUBSUME: A Dataset for Subjective Summary Extraction from Wikipedia Documents (Yadav et al., NewSum 2021)
- PDF:
- https://preview.aclanthology.org/corrections-2024-05/2021.newsum-1.14.pdf
- Data
- SubSumE, CNN/Daily Mail