Abstract
We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with k sentences, the algorithm only needs to execute approximately k iterations, making it very efficient for a long document. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.- Anthology ID:
- 2020.sustainlp-1.8
- Volume:
- Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, Goran Glavaš, Shafiq Joty, Alex Wang, Thomas Wolf
- Venue:
- sustainlp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 54–62
- Language:
- URL:
- https://aclanthology.org/2020.sustainlp-1.8
- DOI:
- 10.18653/v1/2020.sustainlp-1.8
- Cite (ACL):
- Alicia Tsai and Laurent El Ghaoui. 2020. Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 54–62, Online. Association for Computational Linguistics.
- Cite (Informal):
- Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm (Tsai & El Ghaoui, sustainlp 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2020.sustainlp-1.8.pdf