Abstract
Large language models have shown impressive performance across a wide variety of tasks, including text summarization. In this paper, we show that this strong performance extends to opinion summarization. We explore several pipeline methods for applying GPT-3.5 to summarize a large collection of user reviews in aprompted fashion. To handle arbitrarily large numbers of user reviews, we explore recursive summarization as well as methods for selecting salient content to summarize through supervised clustering or extraction. On two datasets, an aspect-oriented summarization dataset of hotel reviews (SPACE) and a generic summarization dataset of Amazon and Yelp reviews (FewSum), we show that GPT-3.5 models achieve very strong performance in human evaluation. We argue that standard evaluation metrics do not reflect this, and introduce three new metrics targeting faithfulness, factuality, and genericity to contrast these different methods.- Anthology ID:
- 2023.findings-acl.591
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9282–9300
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.591
- DOI:
- 10.18653/v1/2023.findings-acl.591
- Cite (ACL):
- Adithya Bhaskar, Alex Fabbri, and Greg Durrett. 2023. Prompted Opinion Summarization with GPT-3.5. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9282–9300, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Prompted Opinion Summarization with GPT-3.5 (Bhaskar et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.591.pdf