Rukmini Bapat
2022
Learning to Prioritize: Precision-Driven Sentence Filtering for Long Text Summarization
Alex Mei
|
Anisha Kabir
|
Rukmini Bapat
|
John Judge
|
Tony Sun
|
William Yang Wang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Neural text summarization has shown great potential in recent years. However, current state-of-the-art summarization models are limited by their maximum input length, posing a challenge to summarizing longer texts comprehensively. As part of a layered summarization architecture, we introduce PureText, a simple yet effective pre-processing layer that removes low- quality sentences in articles to improve existing summarization models. When evaluated on popular datasets like WikiHow and Reddit TIFU, we show up to 3.84 and 8.57 point ROUGE-1 absolute improvement on the full test set and the long article subset, respectively, for state-of-the-art summarization models such as BertSum and BART. Our approach provides downstream models with higher-quality sentences for summarization, improving overall model performance, especially on long text articles.