Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization
Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun, William Yang Wang
Abstract
Neural text summarization has shown great potential in recent years. However, current state-of-the-art summarization models are limited by their maximum input length, posing a challenge to summarizing longer texts comprehensively. As part of a layered summarization architecture, we introduce PureText, a simple yet effective pre-processing layer that removes low- quality sentences in articles to improve existing summarization models. When evaluated on popular datasets like WikiHow and Reddit TIFU, we show up to 3.84 and 8.57 point ROUGE-1 absolute improvement on the full test set and the long article subset, respectively, for state-of-the-art summarization models such as BertSum and BART. Our approach provides downstream models with higher-quality sentences for summarization, improving overall model performance, especially on long text articles.- Anthology ID:
- 2022.lrec-1.33
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 313–318
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.33
- DOI:
- Cite (ACL):
- Alex Mei, Anisha Kabir, Rukmini Bapat, John Judge, Tony Sun, and William Yang Wang. 2022. Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 313–318, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization (Mei et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.lrec-1.33.pdf
- Data
- Reddit TIFU, WikiHow