Max Savery


2020

pdf
Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning
Travis Goodwin | Max Savery | Dina Demner-Fushman
Findings of the Association for Computational Linguistics: EMNLP 2020

Automatic summarization research has traditionally focused on providing high quality general-purpose summaries of documents. However, there are many applications which require more specific summaries, such as supporting question answering or topic-based literature discovery. In this paper we study the problem of conditional summarization in which content selection and surface realization are explicitly conditioned on an ad-hoc natural language question or topic description. Because of the difficulty in obtaining sufficient reference summaries to support arbitrary conditional summarization, we explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. We present four new summarization datasets, two novel “online” or adaptive task-mixing strategies, and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality.

pdf
Flight of the PEGASUS? Comparing Transformers on Few-shot and Zero-shot Multi-document Abstractive Summarization
Travis Goodwin | Max Savery | Dina Demner-Fushman
Proceedings of the 28th International Conference on Computational Linguistics

Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.