Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization

Ming Shen, Jie Ma, Shuai Wang, Yogarshi Vyas, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba


Abstract
Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on Space and 0.5 ROUGE-1 point on Oposum+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on Space for aspect-specific opinion summarization and remains competitive on other metrics.
Anthology ID:
2023.findings-eacl.142
Volume:
Findings of the Association for Computational Linguistics: EACL 2023
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1898–1911
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2023.findings-eacl.142/
DOI:
10.18653/v1/2023.findings-eacl.142
Bibkey:
Cite (ACL):
Ming Shen, Jie Ma, Shuai Wang, Yogarshi Vyas, Kalpit Dixit, Miguel Ballesteros, and Yassine Benajiba. 2023. Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1898–1911, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization (Shen et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2023.findings-eacl.142.pdf
Video:
 https://preview.aclanthology.org/build-pipeline-with-new-library/2023.findings-eacl.142.mp4